Why is this struct size 3 instead of 2? - c++

I have defined this struct:
typedef struct
{
char A:3;
char B:3;
char C:3;
char D:3;
char E:3;
} col;
The sizeof(col) give me the output of 3, but shouldn't it be 2? If I comment just one element, the sizeof is 2. I don't understand why: five element of 3 bits are equal to 15 bits, and that's less than 2 bytes.
Is there an "internal size" in defining a structure like this one? I just need a clarification, because from my notion of the language so far, I expected a size of 2 byte, not 3.

Because you are using char as the underlying type for your fields, the compiler tries to group bits by bytes, and since it cannot put more than eight bits in each byte, it can only store two fields per byte.
The total sum of bits your struct uses is 15, so the ideal size to fit that much data would be a short.
#include <stdio.h>
typedef struct
{
char A:3;
char B:3;
char C:3;
char D:3;
char E:3;
} col;
typedef struct {
short A:3;
short B:3;
short C:3;
short D:3;
short E:3;
} col2;
int main(){
printf("size of col: %lu\n", sizeof(col));
printf("size of col2: %lu\n", sizeof(col2));
}
The above code (for a 64-bit platform like mine) will indeed yield 2 for the second struct. For anything larger than a short, the struct will fill no more than one element of the used type, so - for that same platform - the struct will end up with size four for int , eight for long, etc.

Because you can't have a bit packet field that spans across the minimum alignment boundary (which is 1 byte) so they'll probably get packed like
byte 1
A : 3
B : 3
padding : 2
byte 2
C : 3
D : 3
padding : 2
byte 3
E : 3
padding : 5
(the orders of field/padding inside the same byte is not intentional, it's just to give you the idea, since the compiler could laid them down how it prefers)

The first two bit fields fit into a single char. The third cannot fit into that char and needs a new one. 3 + 3 + 3 = 9 which doesn't fit into an 8 bit char.
So the first pair takes a char, the second pair takes a char, and the last bit field get a third char.

Most compilers allow you to control the padding, e.g. using #pragmas. Here's an example with GCC 4.8.1:
#include <stdio.h>
typedef struct
{
char A:3;
char B:3;
char C:3;
char D:3;
char E:3;
} col;
#pragma pack(push, 1)
typedef struct {
char A:3;
char B:3;
char C:3;
char D:3;
char E:3;
} col2;
#pragma pack(pop)
int main(){
printf("size of col: %lu\n", sizeof(col)); // 3
printf("size of col2: %lu\n", sizeof(col2)); // 2
}
Note that the default behaviour of the compiler is there for a reason and will probably give you better performance.

Even though the ANSI C standard specifies too little about how bitfields are packed to offer any significant advantage over "compilers are allowed to pack bitfields however they see fit", it nonetheless in many cases forbids compilers from packing things in the most efficient fashion.
In particular, if a structure contains bitfields, a compiler is required to store it as a structure which contains one or more anonymous fields of some "normal" storage type and then logically subdivide each such field into its constituent bitfield parts. Thus, given:
unsigned char foo1: 3;
unsigned char foo2: 3;
unsigned char foo3: 3;
unsigned char foo4: 3;
unsigned char foo5: 3;
unsigned char foo6: 3;
unsigned char foo7: 3;
If unsigned char is 8 bits, the compiler would be required to allocate four fields of that type, and assign two bitfields to all but one (which would be in a char field of its own). If all char declarations had been replaced with short, then there would be two fields of type short, one of which would hold five bitfields and the other of which would hold the remaining two.
On a processor without alignment restrictions, the data could be laid out more efficiently by using unsigned short for the first five fields and unsigned char for the last two, storing seven three-bit fields in three bytes. While it should be possible to store eight three-bit fields in three bytes, a compiler could only allow that if there existed a three-byte numeric type which could be used as the "outer field" type.
Personally, I consider bitfields as defined to be basically useless. If code needs to work with binary-packed data, it should explicitly define storage locations of actual types, and then use macros or some other such means to access the bits thereof. It would be helpful if C supported a syntax like:
unsigned short f1;
unsigned char f2;
union foo1 = f1:0.3;
union foo2 = f1:3.3;
union foo3 = f1:6.3;
union foo4 = f1:9.3;
union foo5 = f1:12.3;
union foo6 = f2:0.3;
union foo7 = f2:3.3;
Such a syntax, if allowed, would make it possible for code to use bitfields in a portable fashion, without regard for word sizes or byte orderings (foo0 would be in the three least-significant bits of f1, but those could be stored at the lower or higher address). Absent such a feature, however, macros are probably the only portable way to operate with such things.

Related

With out type casting how I can fill the bit fields

#include <iostream>
#include <bitset>
typedef struct
{
int i;
char a[4];
uint8_t j:1;
uint8_t k:1;
} abctest;
int main()
{
abctest tryabc;
memset(&tryabc, 0x00, sizeof(tryabc));
std::bitset<1> b;
b = false;
std::cout << b << '\n';
b = true;
std::cout << sizeof(b) << '\n';
}
My doubt is like I have a char array, it is basically a structure received in some module, in this structure I have bit fields also, I can use memcpy but I cannot
Type cast the buffer to structure (for e.g if my char* arr is actually of type struct abc, I cannot do abc* temp = (abc*)arr)
All I can do is memcpy only, So I want to know with out type casting how I can fill the bit fields.
If you know the literal data type and its size in bytes, a variable can be used with bit-shifting to store and extract bits into the array. This is a lower-level function that still exists in C++ but is more related to the low-level programming style of C.
Another way is to use the division and modulo with powers of 2 to encode bits at exact locations. I'd suggest you look up how binary works first and then figure out that shifting to the right by 1 actually divides by 2.
Cheers!
Why can't you typecast a char array into an abctest pointer? I tested it and all works well:
typedef struct
{
int i;
char a[4];
uint8_t j:1;
uint8_t k:1;
} abctest;
int main(int argc, char **argv)
{
char buf[9];
abctest *abc = (abctest*)buf;
memset(buf, 0x00, sizeof(buf));
printf("%d\n", abc->j);
}
However, while you definitely CAN typecast a char array into an abctest pointer, it doesn't mean you SHOULD do that. I think you should definitely learn about data serialization and unserialization. If you want to convert a complex data structure into a character array, typecasting is not the solution as the data structure members may have different sizes or alignment constraints on 64-bit machine than on a 32-bit machine. Furthermore, if you typecast a char array into a struct pointer, the alignment may be incorrect which may result in problems using RISC processors.
You could serialize the code by e.g. writing i as a 32-bit integer in network byte order, a as 4 characters and j and k as two bits in one character (the rest 6 being unused). Then when you unserialize it, you read i from the 32-bit integer in network byte order, the a from 4 characters and the remaining character gives j and k.

Define an enum to be smaller than one byte / Why is this struct larger than one byte?

I'd like to define an enum to be smaller than one byte while maintaining type safety.
Defining an enum as:
enum MyEnum : unsigned char
{
i ,j, k, w
};
I can shrink it to one byte, however I'd like to make it use only 2 bits since I will at most have 4 values in it. Can this be done?
In my struct where I use the enum, the following does not work
struct MyStruct
{
MyEnum mEnum : 2; // This will be 4 bytes in size
};
Thanks!
Update:
The questions comes from this scenario:
enum MyEnum : unsigned char
{
i ,j, k, w
};
struct MyStruct
{
union
{
signed int mXa:3;
unsigned int mXb:3;
};
union
{
signed int mYa:3;
unsigned int mYb:3;
};
MyEnum mEnum:2;
};
sizeof(MyStruct) is showing 9 bytes. Ideally I'd like the struct to be 1 bytes in size.
Update for implemented solution:
This struct is one byte and offers the same functionality and type safety:
enum MyEnum :unsigned char
{
i,j,k,w
};
struct MyStruct
{
union
{
struct { MyEnum mEnum:2; char mXa:3; char mXb:3;};
struct { MyEnum mEnum:2; unsigned char mYa:3; unsigned char mYb:3;};
};
};
As per standard definition, a types sizeof must be at least 1 byte. This is the smallest addressable unit of memory.
The feature of bitfields you are mentioning allows to define members of structures to have smaller sizes, but the struct itself may not be because
It must be of at least 1 byte too
Alignment considerations might need it to be even bigger
additionally you may not take the address of bitfield members, since as said above, a byte is the smallest addressable unit of memory (You can already see that by sizeofactually returning the number of bytes, not bits, so if you expected less than CHAR_BIT bits, sizeof would not even be able to express it).
bitfields can only share space if they use the same underlying type. And any unused bits are actually left unused; if the sum of bits in an unsigned int bitfield is 3 bits, it still takes 4 bytes total. Since both enums have unsigned int members, they're both 4 bytes, but since they are bitfields, they have an alignment of one. So the first enum is 4 bytes, and the second is four bytes, then the MyEnum is 1 byte. Since all of those have an alignment of one, no padding is needed.
Unfortunately, union doesn't really work with bitfields really at all. Bitfields are for integer types only. The most I could get your data to without serious redesign is 3 bytes: http://coliru.stacked-crooked.com/view?id=c6ad03c93d7893ca2095fabc7f72ca48-e54ee7a04e4b807da0930236d4cc94dc
enum MyEnum : unsigned char
{
i ,j, k, w
};
union MyUnion
{
signed char ma:3; //char to save memory
unsigned char mb:3;
};
struct MyStruct
{
MyUnion X;
MyUnion Y;
MyEnum mEnum;
}; //this structure is three bytes
In the complete redesign category, you have this: http://coliru.stacked-crooked.com/view?id=58269eef03981e5c219bf86167972906-e54ee7a04e4b807da0930236d4cc94dc
No. C++ defines "char" to be the smallest addressable unit of memory for the platform. You can't address 2 bits.
Bit packing 'Works for me'
#include <iostream>
enum MyEnum : unsigned char
{
i ,j, k, w
};
struct MyStruct
{
MyEnum mEnum : 2;
unsigned char val : 6;
};
int main()
{
std::cout << sizeof(MyStruct);
}
prints out 1. How / what are you measuring?
Edit: Live link
Are you doing something like having a pointer as the next thing in the struct? In which case, you'll have 30bits of dead space as pointers must be 4 byte aligned on most 32bit systems.
Edit: With your updated example, its the unions which are breaking you
enum MyEnum : unsigned char
{
i ,j, k, w
};
struct MyStruct
{
unsigned char mXb:3;
unsigned char mYb:3;
MyEnum mEnum:2;
};
Has size 1. I'm not sure how unions and bit packing work together though, so I'm no more help.

size of a structure containing bit fields [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
I was trying to understand the concept of bit fields.
But I am not able to find why the size of the following structure in CASE III is coming out as 8 bytes.
CASE I:
struct B
{
unsigned char c; // +8 bits
} b;
sizeof(b); // Output: 1 (because unsigned char takes 1 byte on my system)
CASE II:
struct B
{
unsigned b: 1;
} b;
sizeof(b); // Output: 4 (because unsigned takes 4 bytes on my system)
CASE III:
struct B
{
unsigned char c; // +8 bits
unsigned b: 1; // +1 bit
} b;
sizeof(b); // Output: 8
I don't understand why the output for case III comes as 8. I was expecting 1(char) + 4(unsigned) = 5.
You can check the layout of the struct by using offsetof, but it will be something along the lines of:
struct B
{
unsigned char c; // +8 bits
unsigned char pad[3]; //padding
unsigned int bint; //your b:1 will be the first byte of this one
} b;
Now, it is obvious that (in a 32-bit arch.) the sizeof(b) will be 8, isn't it?
The question is, why 3 bytes of padding, and not more or less?
The answer is that the offset of a field into a struct has the same alignment requirements as the type of the field itself. In your architecture, integers are 4-byte-aligned, so offsetof(b, bint) must be multiple of 4. It cannot be 0, because there is the c before, so it will be 4. If field bint starts at offset 4 and is 4 bytes long, then the size of the struct is 8.
Another way to look at it is that the alignment requirement of a struct is the biggest of any of its fields, so this B will be 4-byte-aligned (as it is your bit field). But the size of a type must be a multiple of the alignment, 4 is not enough, so it will be 8.
I think you're seeing an alignment effect here.
Many architectures require integers to be stored at addresses in memory that are multiple of the word size.
This is why the char in your third struct is being padded with three more bytes, so that the following unsigned integer starts at an address that is a multiple of the word size.
Char are by definition a byte. ints are 4 bytes on a 32 bit system. And the struct is being padded the extra 4.
See http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86 for some explanation of padding
To keep the accesses to memory aligned the compiler is adding padding if you pack the structure it will no add the padding.
I took another look at this and here's what I found.
From the C book, "Almost everything about fields is implementation-dependant."
On my machine:
struct B {
unsigned c: 8;
unsigned b: 1;
}b;
printf("%lu\n", sizeof(b));
print 4 which is a short;
You were mixing bit fields with regular struct elements.
BTW, a bit fields is defined as: "a set of adjacent bits within a sindle implementation-defined storage unit" So, I'm not even sure that the ':8' does what you want. That would seem to not be in the spirit of bit fields (as it's not a bit any more)
The alignment and total size of the struct are platform and compiler specific. You cannot not expect straightforward and predictable answers here. Compiler can always have some special idea. For example:
struct B
{
unsigned b0: 1; // +1 bit
unsigned char c; // +8 bits
unsigned b1: 1; // +1 bit
};
Compiler can merge fields b0 and b1 into one integer and may not. It is up to compiler. Some compilers have command line keys that control this, some compilers not. Other example:
struct B
{
unsigned short c, d, e;
};
It is up to compiler to pack/not pack the fields of this struct (asuming 32 bit platform). Layout of the struct can differ between DEBUG and RELEASE builds.
I would recommend using only the following pattern:
struct B
{
unsigned b0: 1;
unsigned b1: 7;
unsigned b2: 2;
};
When you have sequence of bit fields that share the same type, compiler will put them into one int. Otherwise various aspects can kick in. Also take into account that in a big project you write piece of code and somebody else will write and rewrite the makefile; move your code from one dll into another. At this point compiler flags will be set and changed. 99% chance that those people will have no idea of alignment requirements for your struct. They will not even open your file ever.

Bit setting question

In C or C++, it's apparently possible to restrict the number of bits a variable has, so for example:
unsigned char A:1;
unsigned char B:3;
I am unfamiliar however with how it works specifically, so a number of questions:
If I have a class with the following variables:
unsigned char A:1;
unsigned char B:3;
unsigned char C:1;
unsigned char D:3;
What is the above technique actually called?
Is above class four bytes in size, or one byte in size?
Are the variables treated as 1 (or 3) bits as shown, or as per the 'unsigned char', treated as a byte each?
Is there someway of combining the bits to a centralised byte? So for example:
.
unsigned char MainByte;
unsigned char A:1; //Can this be made to point at the first bit in MainByte?
unsigned char B:3; //Etc etc
unsigned char C:1;
unsigned char D:3;
Is there an article that covers this topic in more depth?
If 'A:1' is treated like an entire byte, what is the point/purple of it?
Feel free to mention any other considerations (like compiler restrictions or other limitations).
Thank you.
What is the above technique actually called?
Bitfields. And you're only supposed to use int (signed, unsigned or otherwise) as the "type", not char.
Is above class four bytes in size, or one byte in size?
Neither. It is probably sizeof(int) because the compiler generates a word-sized object. The actual bitfields will be stored within a byte, however. It'll just waste some space.
Are the variables treated as 1 (or 3) bits as shown, or as per the 'unsigned char', treated as a byte each?
They represent only the bits specified, and will be packed as tightly as possible.
Is there someway of combining the bits to a centralised byte? So for example:
Use a union:
struct bits {
unsigned A:1;
unsigned B:3;
unsigned C:1;
unsigned D:3;
};
union SplitByte {
struct bits Bits;
unsigned char Byte[sizeof(struct bits)];
/* the array is a trick so the two fields
are guaranteed to be the same size and
thus be aligned on the same boundary */
} SplitByteObj;
// access the byte
SplitByteObj.Byte[0]
// access a bitfield
SplitByteObj.Bits.B
Note that there are problems with bitfields, for example when using threads. Each bitfield cannot be accessed individually, so you may get errors if you try to use a mutex to guard each of them. Also, the order in which the fields are laid out is not clearly specified by the standard. Many people prefer to use bitwise operators to implement bitfields manually for that reason.
Is there an article that covers this topic in more depth?
Not many. The first few you'll get when you Google it are about all you'll find. They're not a widely used construct. You'll be best off nitpicking the standard to figure out exactly how they work so that you don't get bitten by a weird edge case. I couldn't tell you exactly where in the standard they're specified.
If 'A:1' is treated like an entire byte, what is the point/purple of it?
It's not, but I've addressed this already.
These are bit-fields.
The details of how these fields are arranged in memory are largely implementation-defined. Typically, you will find that the compiler packs them in some way. But it may take various alignment issues into account.

Forcing unaligned bitfield packing in MSVC

I've a struct of bitfields that add up to 48 bits. On GCC this correctly results in a 6 byte structure, but in MSVC the structure comes out 8 bytes. I need to find some way to force MSVC to pack the struct properly, both for interoperability and because it's being used in a memory-critical environment.
The struct seen below consists of three 15-bit numbers, one 2-bit number, and a 1-bit sign. 15+15+15+2+1 = 48, so in theory it should fit into six bytes, right?
struct S
{
unsigned short a:15;
unsigned short b:15;
unsigned short c:15;
unsigned short d:2;
unsigned short e:1;
};
However, compiling this on both GCC and MSVC results in sizeof(S) == 8. Thinking that this might have to do with alignment, I tried using #pragma pack(1) before the struct declaration, telling the compiler to back to byte, not int, boundaries. On GCC, this worked, resulting in sizeof(S) == 6.
However, on MSVC05, the sizeof still came out to 8, even with pack(1) set! After reading this other SO answer, I tried replacing unsigned short d with unsigned char and unsigned short e with bool. The result is sizeof(S) == 7!
I found that if I split d into two one-bit fields and wedged them in between the other members, the struct finally packed properly.
struct S
{
unsigned short a:15;
unsigned short dHi : 1;
unsigned short b:15;
unsigned short dLo : 1;
unsigned short c:15;
unsigned short e:1;
};
printf( "%d\n", sizeof(S) ); // "6"
But having d split like that is cumbersome and causes trouble for me later on when I have to work on the struct. Is there some way I can force MSVC to pack this struct into 6 bytes, exactly as GCC does?
It is implementation defined how fields will be placed in the structure. Visual Studio will fit consecutive bitfields into an underlying type, if it can, and waste the leftover space. (C++ Bit Fields in VS)
If you use the type "unsigned __int64" to declare all elements of the structure, you'll get an object with sizeof(S)=8, but the last two bytes will be unused and the first six will contain the data in the format you want.
Alternatively, if you can accept some structure reordering, this will work
#pragma pack(1)
struct S3
{
unsigned int a:15;
unsigned int b:15;
unsigned int d:2;
unsigned short c:15;
unsigned short e:1;
};
I don't think so, and I think it's MSVC's behavior that is actually correct, and GCC that deviates from the standard.
AFAIK, the standard does not permit bitfields to cross word boundaries of the underlying type.