I've a struct of bitfields that add up to 48 bits. On GCC this correctly results in a 6 byte structure, but in MSVC the structure comes out 8 bytes. I need to find some way to force MSVC to pack the struct properly, both for interoperability and because it's being used in a memory-critical environment.
The struct seen below consists of three 15-bit numbers, one 2-bit number, and a 1-bit sign. 15+15+15+2+1 = 48, so in theory it should fit into six bytes, right?
struct S
{
unsigned short a:15;
unsigned short b:15;
unsigned short c:15;
unsigned short d:2;
unsigned short e:1;
};
However, compiling this on both GCC and MSVC results in sizeof(S) == 8. Thinking that this might have to do with alignment, I tried using #pragma pack(1) before the struct declaration, telling the compiler to back to byte, not int, boundaries. On GCC, this worked, resulting in sizeof(S) == 6.
However, on MSVC05, the sizeof still came out to 8, even with pack(1) set! After reading this other SO answer, I tried replacing unsigned short d with unsigned char and unsigned short e with bool. The result is sizeof(S) == 7!
I found that if I split d into two one-bit fields and wedged them in between the other members, the struct finally packed properly.
struct S
{
unsigned short a:15;
unsigned short dHi : 1;
unsigned short b:15;
unsigned short dLo : 1;
unsigned short c:15;
unsigned short e:1;
};
printf( "%d\n", sizeof(S) ); // "6"
But having d split like that is cumbersome and causes trouble for me later on when I have to work on the struct. Is there some way I can force MSVC to pack this struct into 6 bytes, exactly as GCC does?
It is implementation defined how fields will be placed in the structure. Visual Studio will fit consecutive bitfields into an underlying type, if it can, and waste the leftover space. (C++ Bit Fields in VS)
If you use the type "unsigned __int64" to declare all elements of the structure, you'll get an object with sizeof(S)=8, but the last two bytes will be unused and the first six will contain the data in the format you want.
Alternatively, if you can accept some structure reordering, this will work
#pragma pack(1)
struct S3
{
unsigned int a:15;
unsigned int b:15;
unsigned int d:2;
unsigned short c:15;
unsigned short e:1;
};
I don't think so, and I think it's MSVC's behavior that is actually correct, and GCC that deviates from the standard.
AFAIK, the standard does not permit bitfields to cross word boundaries of the underlying type.
Related
I have defined this struct:
typedef struct
{
char A:3;
char B:3;
char C:3;
char D:3;
char E:3;
} col;
The sizeof(col) give me the output of 3, but shouldn't it be 2? If I comment just one element, the sizeof is 2. I don't understand why: five element of 3 bits are equal to 15 bits, and that's less than 2 bytes.
Is there an "internal size" in defining a structure like this one? I just need a clarification, because from my notion of the language so far, I expected a size of 2 byte, not 3.
Because you are using char as the underlying type for your fields, the compiler tries to group bits by bytes, and since it cannot put more than eight bits in each byte, it can only store two fields per byte.
The total sum of bits your struct uses is 15, so the ideal size to fit that much data would be a short.
#include <stdio.h>
typedef struct
{
char A:3;
char B:3;
char C:3;
char D:3;
char E:3;
} col;
typedef struct {
short A:3;
short B:3;
short C:3;
short D:3;
short E:3;
} col2;
int main(){
printf("size of col: %lu\n", sizeof(col));
printf("size of col2: %lu\n", sizeof(col2));
}
The above code (for a 64-bit platform like mine) will indeed yield 2 for the second struct. For anything larger than a short, the struct will fill no more than one element of the used type, so - for that same platform - the struct will end up with size four for int , eight for long, etc.
Because you can't have a bit packet field that spans across the minimum alignment boundary (which is 1 byte) so they'll probably get packed like
byte 1
A : 3
B : 3
padding : 2
byte 2
C : 3
D : 3
padding : 2
byte 3
E : 3
padding : 5
(the orders of field/padding inside the same byte is not intentional, it's just to give you the idea, since the compiler could laid them down how it prefers)
The first two bit fields fit into a single char. The third cannot fit into that char and needs a new one. 3 + 3 + 3 = 9 which doesn't fit into an 8 bit char.
So the first pair takes a char, the second pair takes a char, and the last bit field get a third char.
Most compilers allow you to control the padding, e.g. using #pragmas. Here's an example with GCC 4.8.1:
#include <stdio.h>
typedef struct
{
char A:3;
char B:3;
char C:3;
char D:3;
char E:3;
} col;
#pragma pack(push, 1)
typedef struct {
char A:3;
char B:3;
char C:3;
char D:3;
char E:3;
} col2;
#pragma pack(pop)
int main(){
printf("size of col: %lu\n", sizeof(col)); // 3
printf("size of col2: %lu\n", sizeof(col2)); // 2
}
Note that the default behaviour of the compiler is there for a reason and will probably give you better performance.
Even though the ANSI C standard specifies too little about how bitfields are packed to offer any significant advantage over "compilers are allowed to pack bitfields however they see fit", it nonetheless in many cases forbids compilers from packing things in the most efficient fashion.
In particular, if a structure contains bitfields, a compiler is required to store it as a structure which contains one or more anonymous fields of some "normal" storage type and then logically subdivide each such field into its constituent bitfield parts. Thus, given:
unsigned char foo1: 3;
unsigned char foo2: 3;
unsigned char foo3: 3;
unsigned char foo4: 3;
unsigned char foo5: 3;
unsigned char foo6: 3;
unsigned char foo7: 3;
If unsigned char is 8 bits, the compiler would be required to allocate four fields of that type, and assign two bitfields to all but one (which would be in a char field of its own). If all char declarations had been replaced with short, then there would be two fields of type short, one of which would hold five bitfields and the other of which would hold the remaining two.
On a processor without alignment restrictions, the data could be laid out more efficiently by using unsigned short for the first five fields and unsigned char for the last two, storing seven three-bit fields in three bytes. While it should be possible to store eight three-bit fields in three bytes, a compiler could only allow that if there existed a three-byte numeric type which could be used as the "outer field" type.
Personally, I consider bitfields as defined to be basically useless. If code needs to work with binary-packed data, it should explicitly define storage locations of actual types, and then use macros or some other such means to access the bits thereof. It would be helpful if C supported a syntax like:
unsigned short f1;
unsigned char f2;
union foo1 = f1:0.3;
union foo2 = f1:3.3;
union foo3 = f1:6.3;
union foo4 = f1:9.3;
union foo5 = f1:12.3;
union foo6 = f2:0.3;
union foo7 = f2:3.3;
Such a syntax, if allowed, would make it possible for code to use bitfields in a portable fashion, without regard for word sizes or byte orderings (foo0 would be in the three least-significant bits of f1, but those could be stored at the lower or higher address). Absent such a feature, however, macros are probably the only portable way to operate with such things.
I was just playing around with bit fields and came across something that I can't quite figure out how to get around.
(Note about the platform: size of an int = 2bytes, long = 4bytes, long long = 8bytes - thought it worth mentioning as I know it can vary. Also the 'byte' type is defined as an 'unsigned char')
I would like to be able to make an array of two 36 bit variables and put them into a union with an array of 9 bytes. This is what I came up with:
typedef union {
byte bytes[9];
struct {
unsigned long long data:36;
} integers[2];
} Colour;
I was working on the theory that the compiler would realise there was supposed to be two bitfields as part of the anonymous struct and put them together into the space of 9 bytes. However it turns out that they get aligned at a byte boundary so the union occupies 10 bytes not 9, which makes perfect sense.
The question is then, is there a way to create an array of two bit fields like this? I considered the 'packed' attribute, but the compiler just ignores it.
While this works as expected (sizeof() returns 9):
typedef union {
byte bytes[9];
struct {
unsigned long long data0:36;
unsigned long long data1:36;
} integers;
} Colour;
It would be preferable to have it accessible as an array.
Edit:
Thanks to cdhowie for his explanation of why this won't work.
Fortunately I thought of a way to achieve what I want:
typedef union {
byte bytes[9];
struct {
unsigned long long data0:36;
unsigned long long data1:36;
unsigned long long data(byte which){
return (which?data1:data0);
}
void data(byte which, unsigned long long _data){
if(which){
data1 = _data;
} else {
data0 = _data;
}
}
} integers;
} Colour;
You can't directly do this using arrays, if you want each bitfield to be exactly 36 bits wide.
Pointers must be aligned to byte boundaries, that's just the way pointers are. Since arrays function like pointers in most cases (with exceptions), this is just not possible with bitfields that contain a number of bits not evenly divisible by 8. (What would you expect &(((Colour *) 0)->integers[1]) to return if the bitfields were packed? What value would make sense?)
In your second example, the bitfields can be tightly-packed because there is no pointer math going on under the hood. For things to be addressable by pointer, they must fall on a byte boundary, since bytes are the units used to "measure" pointers.
You will note that if you try to take the address of (((Colour *) 0)->integers.data0) or data1 in the second example, the compiler will issue an error, for exactly this reason.
In C or C++, it's apparently possible to restrict the number of bits a variable has, so for example:
unsigned char A:1;
unsigned char B:3;
I am unfamiliar however with how it works specifically, so a number of questions:
If I have a class with the following variables:
unsigned char A:1;
unsigned char B:3;
unsigned char C:1;
unsigned char D:3;
What is the above technique actually called?
Is above class four bytes in size, or one byte in size?
Are the variables treated as 1 (or 3) bits as shown, or as per the 'unsigned char', treated as a byte each?
Is there someway of combining the bits to a centralised byte? So for example:
.
unsigned char MainByte;
unsigned char A:1; //Can this be made to point at the first bit in MainByte?
unsigned char B:3; //Etc etc
unsigned char C:1;
unsigned char D:3;
Is there an article that covers this topic in more depth?
If 'A:1' is treated like an entire byte, what is the point/purple of it?
Feel free to mention any other considerations (like compiler restrictions or other limitations).
Thank you.
What is the above technique actually called?
Bitfields. And you're only supposed to use int (signed, unsigned or otherwise) as the "type", not char.
Is above class four bytes in size, or one byte in size?
Neither. It is probably sizeof(int) because the compiler generates a word-sized object. The actual bitfields will be stored within a byte, however. It'll just waste some space.
Are the variables treated as 1 (or 3) bits as shown, or as per the 'unsigned char', treated as a byte each?
They represent only the bits specified, and will be packed as tightly as possible.
Is there someway of combining the bits to a centralised byte? So for example:
Use a union:
struct bits {
unsigned A:1;
unsigned B:3;
unsigned C:1;
unsigned D:3;
};
union SplitByte {
struct bits Bits;
unsigned char Byte[sizeof(struct bits)];
/* the array is a trick so the two fields
are guaranteed to be the same size and
thus be aligned on the same boundary */
} SplitByteObj;
// access the byte
SplitByteObj.Byte[0]
// access a bitfield
SplitByteObj.Bits.B
Note that there are problems with bitfields, for example when using threads. Each bitfield cannot be accessed individually, so you may get errors if you try to use a mutex to guard each of them. Also, the order in which the fields are laid out is not clearly specified by the standard. Many people prefer to use bitwise operators to implement bitfields manually for that reason.
Is there an article that covers this topic in more depth?
Not many. The first few you'll get when you Google it are about all you'll find. They're not a widely used construct. You'll be best off nitpicking the standard to figure out exactly how they work so that you don't get bitten by a weird edge case. I couldn't tell you exactly where in the standard they're specified.
If 'A:1' is treated like an entire byte, what is the point/purple of it?
It's not, but I've addressed this already.
These are bit-fields.
The details of how these fields are arranged in memory are largely implementation-defined. Typically, you will find that the compiler packs them in some way. But it may take various alignment issues into account.
Ok i have a struct in my C++ program that is like this:
struct thestruct
{
unsigned char var1;
unsigned char var2;
unsigned char var3[2];
unsigned char var4;
unsigned char var5[8];
int var6;
unsigned char var7[4];
};
When i use this struct, 3 random bytes get added before the "var6", if i delete "var5" it's still before "var6" so i know it's always before the "var6".
But if i remove the "var6" then the 3 extra bytes are gone.
If i only use a struct with a int in it, there is no extra bytes.
So there seem to be a conflict between the unsigned char and the int, how can i fix that?
The compiler is probably using its default alignment option, where members of size x are aligned on a memory boundary evenly divisible by x.
Depending on your compiler, you can affect this behaviour using a #pragma directive, for example:
#pragma pack(1)
will turn off the default alignment in Visual C++:
Specifies the value, in bytes, to be used for packing. The default value for n is 8. Valid values are 1, 2, 4, 8, and 16. The alignment of a member will be on a boundary that is either a multiple of n or a multiple of the size of the member, whichever is smaller.
Note that for low-level CPU performance reasons, it is usually best to try to align your data members so that they fall on an aligned boundary. Some CPU architectures require alignment, while others (such as Intel x86) tolerate misalignment with a decrease in performance (sometimes quite significantly).
Your data structure being aligned so that your int falls on word boundries, which for your target might be 32 or 64 bits.
You can reorganize your struct like so so that this won't happen:
struct thestruct
{
int var6;
unsigned char var1;
unsigned char var2;
unsigned char var3[2];
unsigned char var4;
unsigned char var5[8];
unsigned char var7[4];
};
Are you talking about padding bytes? That's not a bug. As allowed by the C++ standard, the compiler is adding padding to keep the members aligned. This is required for some architectures, and will greatly improve performance for others.
You're having a byte alignment problem. The compiler is adding padding to align the bytes. See this wikipedia article.
Read up on data structure alignment. Essentially, depending on the compiler and compile options, you'll get alignment onto different powers-of-2.
To avoid it, move multi-byte items (int or pointers) before single-byte (signed or unsigned char) items -- although it might still be there after your last item.
While rearranging the order you declare data members inside your struct is fine, it should be emphasized that overriding the default alignment by using #pragmas and such is a bad idea unless you know exactly what you're doing. Depending on your compiler and architecture, attempting to access unaligned data, particularly by storing the address in a pointer and later trying to dereference it, can easily give the dreaded Bus Error or other undefined behavior.
To begin with, the application in question is always going to be on the same processor, and the compiler is always gcc, so I'm not concerned about bitfields not being portable.
gcc lays out bitfields such that the first listed field corresponds to least significant bit of a byte. So the following structure, with a=0, b=1, c=1, d=1, you get a byte of value e0.
struct Bits {
unsigned int a:5;
unsigned int b:1;
unsigned int c:1;
unsigned int d:1;
} __attribute__((__packed__));
(Actually, this is C++, so I'm talking about g++.)
Now let's say I'd like a to be a six bit integer.
Now, I can see why this won't work, but I coded the following structure:
struct Bits2 {
unsigned int a:6;
unsigned int b:1;
unsigned int c:1;
unsigned int d:1;
} __attribute__((__packed__));
Setting b, c, and d to 1, and a to 0 results in the following two bytes:
c0 01
This isn't what I wanted. I was hoping to see this:
e0 00
Is there any way to specify a structure that has three bits in the most significant bits of the first byte and six bits spanning the five least significant bits of the first byte and the most significant bit of the second?
Please be aware that I have no control over where these bits are supposed to be laid out: it's a layout of bits that are defined by someone else's interface.
(Note that all of this is gcc-specific commentary - I'm well aware that the layout of bitfields is implementation-defined).
Not on a little-endian machine: The problem is that on a little-endian machine, the most significant bit of the second byte isn't considered "adjacent" to the least significant bits of the first byte.
You can, however, combine the bitfields with the ntohs() function:
union u_Bits2{
struct Bits2 {
uint16_t _padding:7;
uint16_t a:6;
uint16_t b:1;
uint16_t c:1;
uint16_t d:1;
} bits __attribute__((__packed__));
uint16_t word;
}
union u_Bits2 flags;
flags.word = ntohs(flag_bytes_from_network);
However, I strongly recommend you avoid bitfields and instead use shifting and masks.
Usually you can't do strong assumptions on how the union will be packed, every compiler implementation may choose to pack it differently (to save space or align bitfields inside bytes).
I would suggest you to just work out with masking and bitwise operators..
from this link:
The main use of bitfields is either to allow tight packing of data or to be able to specify the fields within some externally produced data files. C gives no guarantee of the ordering of fields within machine words, so if you do use them for the latter reason, you program will not only be non-portable, it will be compiler-dependent too. The Standard says that fields are packed into ‘storage units’, which are typically machine words. The packing order, and whether or not a bitfield may cross a storage unit boundary, are implementation defined. To force alignment to a storage unit boundary, a zero width field is used before the one that you want to have aligned.
C/C++ has no means of specifying the bit by bit memory layout of structs, so you will need to do manual bit shifting and masking on 8 or 16bit (unsigned) integers (uint8_t, uint16_t from <stdint.h> or <cstdint>).
Of the good dozen of programming languages I know, only very few allow you to specify bit-by-bit memory layout for bit fields: Ada, Erlang, VHDL (and Verilog).
(Community wiki if you want to add more languages to that list.)