I am not totally sure about C, but C++ allows unnamed bit-fields of 0 length. For example:
struct X
{
int : 0;
};
Question one: What practical uses of this can you think of?
Question two: What real-world practical uses (if any) are you aware of?
Edited the example after ice-crime's answer
Edit: OK, thanks to the current answers I now know the theoretical purpose. But the questions are about practical uses so they still hold :)
You use a zero-length bitfield as a hacky way to get your compiler to lay out a structure to match some external requirement, be it another compiler's or architecture's notion of the layout (cross-platform data structures, such as in a binary file format) or a bit-level standard's requirements (network packets or instruction opcodes).
A real-world example is when NeXT ported the xnu kernel from the Motorola 68000 (m68k) architecture to the i386 architecture. NeXT had a working m68k version of their kernel. When they ported it to i386, they found that the i386's alignment requirements differed from the m68k's in such a way that an m68k machine and an i386 machine did not agree on the layout of the NeXT vendor-specific BOOTP structure. In order to make the i386 structure layout agree with the m68k, they added an unnamed bitfield of length zero to force the NV1 structure/nv_U union to be 16-bit aligned.
Here are the relevant parts from the Mac OS X 10.6.5 xnu source code:
/* from xnu/bsd/netinet/bootp.h */
/*
* Bootstrap Protocol (BOOTP). RFC 951.
*/
/*
* HISTORY
*
* 14 May 1992 ? at NeXT
* Added correct padding to struct nextvend. This is
* needed for the i386 due to alignment differences wrt
* the m68k. Also adjusted the size of the array fields
* because the NeXT vendor area was overflowing the bootp
* packet.
*/
/* . . . */
struct nextvend {
u_char nv_magic[4]; /* Magic number for vendor specificity */
u_char nv_version; /* NeXT protocol version */
/*
* Round the beginning
* of the union to a 16
* bit boundary due to
* struct/union alignment
* on the m68k.
*/
unsigned short :0;
union {
u_char NV0[58];
struct {
u_char NV1_opcode; /* opcode - Version 1 */
u_char NV1_xid; /* transcation id */
u_char NV1_text[NVMAXTEXT]; /* text */
u_char NV1_null; /* null terminator */
} NV1;
} nv_U;
};
The standard (9.6/2) only allows 0 length bit-fields as a special case :
As a special case, an unnamed
bit-field with a width of zero
specifies alignment of the next
bit-field at an allocation unit
boundary. Only when declaring an
unnamed bit-field may the
constant-expression be a value equal
to zero.
The only use is described in this quote, although I've never encountered it in practical code yet.
For the record, I just tried the following code under VS 2010 :
struct X {
int i : 3, j : 5;
};
struct Y {
int i : 3, : 0, j : 5; // nice syntax huh ?
};
int main()
{
std::cout << sizeof(X) << " - " << sizeof(Y) << std::endl;
}
The output on my machine is indeed : 4 - 8.
struct X { int : 0; };
is undefined behavior in C.
See (emphasis mine):
(C99, 6.7.2.1p2) "The presence of a struct-declaration-list in a struct-or-union-specifier declares a new type, within a translation unit. The struct-declaration-list is a sequence of declarations for the members of the structure or union. If the struct-declaration-list contains no named members, the behavior is undefined"
(C11 has the same wording.)
You can use an unnamed bit-field with 0 width but not if there is no other named member in the structure.
For example:
struct W { int a:1; int :0; }; // OK
struct X { int :0; }; // Undefined Behavior
By the way for the second declaration, gcc issues a diagnostic (not required by the C Standard) with -pedantic.
On the other hand:
struct X { int :0; };
is defined in GNU C. It is used for example by the Linux kernel (include/linux/bug.h) to force a compilation error using the following macro if the condition is true:
#define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); }))
This is from MSDN and not marked as Microsoft Specific, so I guess this is common C++ standard:
An unnamed bit field of width 0 forces alignment of the next bit field to the next type boundary, where type is the type of the member.
The C11 standard now allows the inclusion of zero length bitfields. Here is an example from the C Committee draft (N1570), which I believe illustrates a practical usage.
3.14 memory location
...
4. EXAMPLE A structure declared as
struct {
char a;
int b:5, c:11, :0, d:8;
struct { int ee:8; } e;
}
contains four separate memory locations: The member a, and bit-fields d and e.ee are each separate memory locations, and can be modified concurrently without interfering with each other. The bit-fields b and c together constitute the fourth memory location. The bit-fields b and c cannot be concurrently modified, but b and a, for example, can be.
So including the zero length bitfield in between the bitfields c and d allows the concurrent modification of b and d as well.
Related
Why does the sizes of these two structs differ?
#pragma pack(push, 1)
struct WordA
{
uint32_t address : 8;
uint32_t data : 20;
uint32_t sign : 1;
uint32_t stateMatrix : 2;
uint32_t parity : 1;
};
struct WordB
{
uint8_t address;
uint32_t data : 20;
uint8_t sign : 1;
uint8_t stateMatrix : 2;
uint8_t parity : 1;
};
#pragma pack(pop)
Somehow WordB occupies 6 bytes instead of four, while WordA occupies exactly 32 bits.
I assumed that given the sum of used bits inside a struct would yield both structs to be of the same size. Apparently I am wrong, but I cannot find an explanation why.
Bit fields page shows only examples when all of the struct members are of the same type, which is a case of WordA.
Can anybody explain, why the sizes don't match and if it is according to the standard or implementation-defined?
Why can't a bit field be split between different underlying types?
It can in the sense that standard allows it.
It wasn't because that's what the language implementer (or rather, the designer of the ABI) chose. This decision may have been preferred because it may make the program faster or the compiler easier to implement.
Here is the standard quote:
[class.bit]
... Allocation of bit-fields within a class object is implementation-defined.
Alignment of bit-fields is implementation-defined.
Bit-fields are packed into some addressable allocation unit.
I have defined a bitfield of enum types to match a set of bits in an embedded system. I'm trying to write a test harness in MSVC for the code, but comparing what should be equal values fails.
The definition looks like this:
typedef enum { SERIAL, PARALLEL } MODE_e;
typedef union {
struct {
TYPE_e Type : 1; // 1
POSITION_e 1Pos : 1; // 2
POSITION_e 2Pos : 1; // 3
bool Enable : 1; // 4
NET_e Net : 1; // 5
TYPE_e Type : 1; // 6
bool En : 1; // 7
TIME_e Time : 3; // 8-10
MODE_e Mode : 1; // 11
bool TestEn : 1; // 12
bool DelayEn : 1; // 13
MODE_e Mode : 1; // 14
bool xEn : 1; // 15
MODE_e yMode : 1; // 16
bool zEnable : 1; // 17
} Bits;
uint32_t Word;
} BITS_t;
Later the following comparison fails:
Store.Bits.Mode = PARALLEL;
if (store.Bits.Mode == PARALLEL)
...
I examined the Mode bool in the debugger, and it looked odd. The value of Mode is -1.
It's as if MSVC considers the value to be a two's complement number, but 1 bit wide, so 0b1 is decimal -1. The enum sets PARALLEL to 1, so the two do not match.
The comparison works fine on the embedded side using LLVM or GCC.
Which behavior is correct? I assume GCC and LLVM have better support for the C standards than MSVC in areas such as bit fields. More importantly, can I work around this difference without making major changes to the embedded code?
Dissecting this in detail, you have the following problems:
There is no guarantee that Type : 1 is the MSB or LSB. Generally, there are no guarantees of the bit-field layout in memory at all.
As mentioned in other answers, enumeration variables (unlike enumeration constants) have implementation-defined size. Meaning that you can't know their size, portably. In addition, if the size is something which isn't the same as either int or _Bool, the compiler need not support it at all.
Enums are most often a signed integer type. And when you create a bit-field of size 1 with a signed type, nobody including the standard knows what it means. Is it the sign bit you intend to store there or is it data?
The size of what the C standard calls "storage unit" inside the bit-field, is unspecified. Typically it is alignment-based. The C standard does guarantee that if you have several bit-fields of the same type trailing each other, they must be merged into the same storage unit (if there is room). For different types, there are no such guarantees.
It is fairly common that when you go from one type like POSITION_e to a different type bool, the compiler places them in different storage units. In practice meaning that there's a high risk of padding bit insertion whenever this happens. Lots of mainstream compilers do in fact behave just like that.
In addition, a struct or union may contain padding bytes anywhere.
In addition, there is the endianess problem.
Conclusion: bit-fields cannot be used in programs that need any form of portability. They cannot be used for the purpose of memory mapping.
Also, you really don't need all these abstraction layers - it's a simple dip-switch, not a space shuttle! :)
Solution:
I would strongly recommend to drop all of this in favour for a plain uint32_t. You can mask individual bits with plain integer constants:
#define DIP_TYPE (1u << 31)
#define DIP_POS (1u << 30)
...
uint32_t dipswitch = ...;
bool actuator_active = dipswitch & DIP_TYPE; // read
dipswitch |= DIP_POS; // write
This is massively portable, well-defined, standardized, MISRA-C compliant - you can even port it between different endianess architectures. It solves all of the above mentioned problems.
I would use the following approach.
typedef enum { SERIAL_TEST_MODE = 0, PARALLEL_TEST_MODE = 1 } TEST_MODE_e;
Then set the value and test the value as follows.
config.jumpers.Bits.TestMode = PARALLEL_TEST_MODE;
if (config.jumpers.Bits.TestMode & PARALLEL_TEST_MODE)
...
The value of 1 will have the least significant bit turned on and the value of 0 would have the least significant bit turned off.
And this should be portable across multiple compilers.
The exact type used to represent an enum is implementation defined. So what is most likely happening is that MSVC is using char for this particular enum which is signed. So declaring a 1-bit bitfield of this type means you get 0 and -1 for the values.
Rather that declaring the bitfield as the type of the enum, declare them as unsigned int or unsigned char so the values are properly represented.
A simple fix I came up with, which is only valid for MSVC and GCC/LLVM, is:
#ifdef _WIN32
#define JOFF 0
#define JON -1
#else
#define JOFF 0
#define JON 1
#endif
typedef enum { SERIAL = JOFF, PARALLEL = JON } TEST_MODE_e;
typedef struct {
char c;
char cc[2];
short s;
char ccc;
}stuck;
Should the above struct have a memory layout as this ?
1 2 3 4 5 6 7
- c - cc - s - ccc -
or this ?
1 2 3 4 5 6 7 8
- c - cc - s - ccc -
I think the first should be better but why my VS09 compiler chooses the second ? (Is my layout correct by the way ?) Thank you
I think that your structure will have the following layout, at least on Windows:
typedef struct {
char c;
char cc[2];
char __padding;
short s;
char ccc;
char __tail_padding;
} stuck;
You could avoid the padding by reordering the structure members:
typedef struct {
char c;
char cc[2];
char ccc;
short s;
} stuck;
The compiler can't choose the second. The standard mandates that the first field must be aligned with the start of the structure.
Are you using offsetof from stddef.h for finding this out ?
6.7.2.1 - 13
A pointer to a structure object, suitably converted, points to its
initial member (or if that member is a bit-field, then to the unit in
which it resides), and vice versa. There may be unnamed padding
within a structure object, but not at its beginning.
It means that you can have
struct s {
int x;
char y;
double z;
};
struct s obj;
int *x = (int *)&obj; /* Legal. */
Put another way
offsetof(s, x); /* Must yield 0. */
Other than at the beginning of a structure, an implementation can put whatever padding it wants in your structures so there's no right way. From C99 6.7.2.1 Structure and union specifiers, paragraphs:
/12:Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
/13:There may be unnamed
padding within a structure object, but not at its beginning.
/15:There may be unnamed padding at the end of a structure or union.
Paragraph 13 also contains:
Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared.
This means that the fields within the structure cannot be re-ordered. And, in a large number of modern implementations (but this is not mandated by the standard), the alignment of an object is equal to its size. For example a 32-bit integer data type may have an alignment requirement of four (8-bit) bytes.
Hence, a logical alignment would be:
offset size field
------ ---- -----
0 1 char c;
1 2 char cc[2];
3 1 padding
4 2 short s;
6 1 char ccc;
7 1 padding
but, as stated, it may be something different. The final padding is to ensure that consecutive array elements are aligned correctly (since the short most likely has to be on a 2-byte boundary).
There are a number of (non-portable) ways in which you may be able to control the padding. Many compilers have a #pragma pack option that you can use to control padding (although be careful: while some systems may just slow down when accessing unaligned data, some will actually dump core for an illegal access).
Also, re-ordering the elements within the structure from largest to smallest tends to reduce padding as well since the larger elements tend to have stricter alignment requirements.
These, and an even uglier "solution" are discussed more here.
While I do really understand your visual representation of the alignment, I can tell you that with VS you can achieve a packed structure by using 'pragma':
__pragma( pack(push, 1) )
struct { ... };
__pragma( pack(pop) )
In general struct-alignment depends on the compiler used, the target-platform (and its address-size) and the weather, IOW in reality it is not well defined.
Others have mentionned that padding may be introduced either between attributes or after the last attribute.
The interesting thing though, I believe, is to understand why.
Types usually have an alignment. This property precises which address are valid (or not) for a particular type. On some architecture, this is a loose requirement (if you do not respect it, you only incur some overhead), on others, violating it causes hardware exceptions.
For example (arbitrary, as each platform define its own):
char: 1
short (16 bits): 2
int (32 bits): 4
long int (64 bits): 8
A compound type will usually have as alignment the maximum of the alignment of its parts.
How does alignment influences padding ?
In order to respect the alignment of a type, some padding may be necessary, for example:
struct S { char a; int b; };
align(S) = max(align(a), align(b)) = max(1, 4) = 4
Thus we have:
// S allocated at address 0x16 (divisible by 4)
0x16 a
0x17
0x18
0x19
0x20 b
0x21 b
0x22 b
0x23 b
Note that because b can only be allocated at an address also divisible by 4, there is some space between a and b, this space is called padding.
Where does padding comes from ?
Padding may have two different reasons:
between attributes, it is caused by a difference in alignment (see above)
at the end of the struct, it is caused by array requirements
The array requirement is that elements of an array should be allocated without intervening padding. This allows one to use pointer arithmetic to navigate from an element to another:
+---+---+---+
| S | S | S |
+---+---+---+
S* p = /**/;
p = p + 1; // <=> p = (S*)((void*)p + sizeof(S));
This means, however, than the structure S size needs be a multiple of S alignment.
Example:
struct S { int a; char b; };
+----+-+---+
| a |b| ? |
+----+-+---+
a: offset 0, size 4
b: offset 4, size 1
?: offset 5, size 3 (padding)
Putting it altogether:
typedef struct {
char a;
char b[2];
short s;
char c;
} stuck;
+-+--+-+--+-+-+
|a| b|?|s |c|?|
+-+--+-+--+-+-+
If you really wish to avoid padding, one (simple) trick (which does not involve addition nor substraction) is to simply order your attributes starting from the maximum alignment.
typedef struct {
short s;
char a;
char b[2];
char c;
} stuck;
+--+-+--+-+
| s|a| b|c|
+--+-+--+-+
It's a simple rule of thumb, especially as the alignment of basic types may change from platform to platform (32bits/64bits) whereas the relative order of the types is pretty stable (exception: the pointers).
Are the members of a structure packed in C/C++?
By packed I mean that they are compact and among the fields there aren't memory spaces.
That isn't what aligned means, and no, no particular alignment or packing is guaranteed. The elements will be in order, but the compiler can insert padding where it chooses. This actually creates (useful) alignment. E.g., for a x86:
struct s
{
char c;
int i;
};
there will probably (but not necessarily) be three bytes between c and i. This allows i to be aligned on a word boundary, which can provide much faster memory access (on some architectures, it's required).
From C99 §6.7.2.1:
Each non-bit-field member of a
structure or union object is aligned
in an implementation- defined manner
appropriate to its type.
What you are asking for is packing, and alignment is different. Both are outside of the scope of the language and are specific for each implementation. Take a look here.
Generally not. Some info here.
Depending on the compiler, you can introduce pragmas to help (from the link above):
#pragma pack(push) /* push current alignment to stack */
#pragma pack(1) /* set alignment to 1 byte boundary */
struct MyPackedData
{
char Data1;
long Data2;
char Data3;
};
#pragma pack(pop) /* restore original alignment from stack */
Typically (but under no guarantees), members of a struct are word-aligned. This means that a field less than the size of a word will be padded to take up an entire word.
However, when the next member of the struct can also fit inside the same word, then the compiler will put both members into the same word. This is more efficient space-wise, but depending on your platform, retrieving said members might be more expensive computationally.
On my 32-bit system using GCC under Cygwin, this program...
#include <iostream>
struct foo
{
char a;
int b;
char c;
};
int main(int argc, char** argv)
{
std::cout << sizeof(foo) << std::endl;
}
outputs '12' because both chars are word-aligned and take up 4 bytes each.
However, switch the struct to
struct foo
{
char a;
char c;
int b;
};
and the output is '8' because both chars next to each other can fit in a single word.
It is possible to pack bytes in order to conserve memory. For instance, pack(2) will tell members that longer than a byte to pack to two-bytes in order to maintain a two-byte boundary so that any padding members are two bytes long. Sometimes packing is used as part of a standard communication protocol where it expects a certain size. Here is what Wikipedia has to say about C/C++ and padding:
Padding is only inserted when a
structure member is followed by a
member with a larger alignment
requirement or at the end of the
structure. By changing the ordering of
members in a structure, it is possible
to change the amount of padding
required to maintain alignment. For
example, if members are sorted by
ascending or descending alignment
requirements a minimal amount of
padding is required. The minimal
amount of padding required is always
less than the largest alignment in the
structure. Computing the maximum
amount of padding required is more
complicated, but is always less than
the sum of the alignment requirements
for all members minus twice the sum of
the alignment requirements for the
least aligned half of the structure
members.
Although C and C++ do not allow the
compiler to reorder structure members
to save space, other languages might.
Since in struct's, the compiler treats things as words, sometimes care must be taken if you are relying on the size of the struct to be a certain size. For instance, aligning char vs int.
They are not packed by default. Instead, they are word-aligned depending on how your machine is set up. If you do want them to be packed. Then you can use __attribute__((__packed__)) at the end of your struct declaration like this:
struct abc {
char a;
int b;
char c;
}__attribute__((__packed__));
Then, for
struct abc _abc;
_abc will be packed.
Reference: Specific structure packing when using the GNU C Compiler
Seeing some outputs of the same structure's variations may give a clue about what is going on. After reading this, if I did not get it wrong, small types will be padded to be a word-lengths.
struct Foo {
char x ; // 1 byte
int y ; // 4 byte
char z ; // 1 byte
int w ; // 4 byte
};
struct FooOrdered {
char x ; // 1 byte
char z ; // 1 byte
int y ; // 4 byte
int w ; // 4 byte
};
struct Bar {
char x ; // 1 byte
int w ; // 4 byte
};
struct BarSingleType {
char x ; // 1 byte
};
int main(int argc, char const *argv[]) {
cout << sizeof(Foo) << endl;
cout << sizeof(FooOrdered) << endl;
cout << sizeof(Bar) << endl;
cout << sizeof(BarSingleType) << endl;
return 0;
}
In my environment output was like this:
16
12
8
1
I am curious to know why bit fields with same data type takes less size than with mixed
data types.
struct xyz
{
int x : 1;
int y : 1;
int z : 1;
};
struct abc
{
char x : 1;
int y : 1;
bool z : 1;
};
sizeof(xyz) = 4
sizeof(abc) = 12.
I am using VS 2005, 64bit x86 machine.
A bit machine/compiler level answer would be great.
Alignment.
Your compiler is going to align variables in a way that makes sense for your architecture. In your case, char, int, and bool are different sizes, so it will go by that information rather than your bit field hints.
There was some discussion in this question on the matter.
The solution is to give #pragma directives or __attributes__ to your compiler to instruct it to ignore alignment optimizations.
The C standard (1999 version, §6.7.2.1, page 102, point 10) says this:
An implementation may allocate any addressable storage unit large enough to hold a
bit-field. If enough space remains, a bit-field that immediately follows another
bit-field in a structure shall be packed into adjacent bits of the same unit.
There does not seem to be any wording to allow the packing to be affected by the types of the fields. Thus I would conclude that this is a compiler bug.
gcc makes a 4 byte struct in either case, on both a 32-bit and a 64-bit machine, under Linux. I don't have VS and can't test that.
It's complier bug or some code error.
All bits assigned in the structure always try to make sizeof highest data type defined.
e.g. In struct xyz sizeof highest data type is 4 i.e. of int.
In the similar fashion for second structure abc highest data type size is 4 for int.
Where as if we change variables of structure as following:
struct abc
{
char a:1;
char b:1;
bool c:1;
};
sizeof(abc) would be 1 not 4. Since size highest data type is 1 and all bits fit into 1byte of char.
various tests could be performed by changing data type in the structure.
Link for output based on old structure:
Visit http://codepad.org/6j5z2CEX
Link for output based on above structure defined by me:
Visit http://codepad.org/fqF9Ob8W
To avoid such problems for sizeof structures we shall properly pack structures using #pragma pack macro.