C++ Union/Struct Bitfield Implementation and Portability - c++

I have a union containing a uint16 and a struct like so:
union pData {
uint16_t w1;
struct {
uint8_t d1 : 8;
uint8_t d2 : 4;
bool b1 : 1;
bool b2 : 1;
bool b3 : 1;
bool b4 : 1;
} bits;
};
My colleague says there is an issue with this being portable, but I am not sure I buy this. Could some please explain (simple as possible) what is "wrong" here?

From C++17 12.2.4 Bit-fields /1 (and C++11 9.6 Bit-fields /1 for that matter, if you want the answer specific to your chosen tags):
Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit. [Note: Bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. - end note]
Reliance on implementation defined behaviour means, by its very nature, non-portable code.

Maybe your colleague guesses that you intend to write to w1 and read from bits or vice versa.
That would be undefined behaviour. In C++ only one member of a union can be active at any one time; writing a member makes it active, and the behaviour of reading an inactive member is undefined.

Related

Is accessing bitfield unions common initial data undefined behavior in C++ standards

Similar to c union and bitfields but in C++ and including access to initial sequence
Similar to Union common initial sequence with primitive but using bitfields
What I want to do is this:
struct A
{
short common : 1;
short a1: 5;
short a2: 8;
};
struct B
{
short common : 1;
short b1: 3;
short b2: 4;
short b3: 6;
};
union C
{
A a;
B b;
};
Then use it as follows:
short foo(C data)
{
if (data.a.common)
{
return data.a.a1*data.a.a2;
}
return data.b.b1*data.b.b2*data.b.b3;
}
The issue seems to be that if data.a.common is false, the code will have accessed data set as a B via a member of A
There seems to be language in the standard about unions and similar initial sequences, but not sure common as a bitfield would count
9.5.1 it is permitted to inspect the common initial sequence of any of standard-layout struct members;
Any clarification would be appreciated
By the definition of common initial sequence:
The common initial sequence of two standard-layout struct ([class.prop]) types is the longest sequence of non-static data members and bit-fields in declaration order, starting with the first such entity in each of the structs, such that corresponding entities have layout-compatible types, either both entities are declared with the no_­unique_­address attribute ([dcl.attr.nouniqueaddr]) or neither is, and either both entities are bit-fields with the same width or neither is a bit-field.
the common initial sequence of struct A and struct B consists of the first bit-field common.

How do bit fields interplay with bits padding in C++

See the C version of this questions here.
I have two questions concerning bit fields when there are padding bits.
Say I have a struct defined as
struct T {
unsigned int x: 1;
unsigned int y: 1;
};
Struct T only has two bits actually used.
Question 1: are these two bits always the least significant bits of the underlying unsigned int? Or it is platform dependent?
Question 2: Are those unused 30 bits always initialized to 0? What does the C++ standard say about it?
Question 1: are these two bits always the least significant bits of the underlying unsigned int? Or it is platform dependent?
Very platform dependent. The standard even has a note just to clarify how much:
[class.bit]
1 ...Allocation of bit-fields within a class object is
implementation-defined. Alignment of bit-fields is
implementation-defined. Bit-fields are packed into some addressable
allocation unit. [ Note: Bit-fields straddle allocation units on some
machines and not on others. Bit-fields are assigned right-to-left on
some machines, left-to-right on others.  — end note ]
You can't assume much of anything about the object layout of a bit field.
Question 2: Are those unused 30 bits always initialized to 0? What does the C++ standard say about it?
Your example has a simple aggregate, so we can enumerate the possible initializations. Specifying no initializer...
T t;
... will default initialize it, leaving the members with indeterminate value. On the other hand, if you specify empty braces...
T t{};
... the object will be aggregate initialized, and so the bit fields will be initialized with {} themselves, and set to zero. But that applies only to the members of the aggregate, which are the bit fields. It's not specified what value, if any, the padding bits take. So we cannot assume they will be initialized to zero.
Q1: Usually from low to hi (i.e. x is 1 << 0, y is 1 << 1, etc).
Q2: The value of the unused bits is undefined. On some compilers/platforms, stack initialised variables might be set to zero first (might!!), but don't count on it!! Heap allocated variables could be anything, so it's best to assume the bits are garbage. Using a slightly non standard anonymous struct buried in a union, you could do something like this to ensure the value of the bits:
union T {
unsigned intval;
struct {
unsigned x : 1;
unsigned y : 1;
};
};
T foo;
foo.intval = 0;

C++ Union Member Access And Undefined Behaviour

I am currently working on a project in which I am provided the following
structre. My work is C++ but the project uses both C and C++. The same structure
definition is used by both C and C++.
typedef struct PacketHeader {
//Byte 0
uint8_t bRes :4;
uint8_t bEmpty :1;
uint8_t bWait :1;
uint8_t bErr :1;
uint8_t bEnable :1;
//Byte 1
uint8_t bInst :4;
uint8_t bCount :3;
uint8_t bRres :1;
//Bytes 2, 3
union {
uint16_t wId; /* Needed for Endian swapping */
struct{
uint16_t wMake :4;
uint16_t wMod :12;
};
};
} PacketHeader;
Depending on how instances of the structure are used, the required endianness of
the structure can be big or little endian. As the first two bytes of the
structure are each single bytes, these don't need altering when the endianness
changes.
Bytes 2 and 3, stored as a single uint16_t, are the only bytes which we need to
swap to acheive the desired endianness. To acheive the endianness swap, we have
been performing the following:
//Returns a constructed instance of PacketHeader with relevant fields set and the provided counter value
PacketHeader myHeader = mmt::BuildPacketHeader(count);
uint16_t packetIdFlipped;
//Swap positions of byte 2 and 3
packetIdFlipped = myHeader.wId << 8;
packetIdFlipped |= (uint16_t)myHeader.wId >> 8;
myHeader.wId = packetIdFlipped;
The function BuildPacketHeader(uint8_t) assigns values to the members wMake and
wMod explicitly, and does not write to the member wId. My question is regarding
the safety of reading from the member wId inside the returned instance of the
structure.
Questions such as
Accessing inactive union member and undefined behavior?,
Purpose of Unions in C and C++,
and Section 10.4 of the draft standard I have each mention the undefined behaviour arising from accessing an inactive member of a union in C++.
Paragraph 1 in Section 10.4 of the linked draft also contains the following note, though I'm not sure I understand all the terminology used:
[Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence (10.3), and if a non-static datamember of an object of this standard-layout union type is active and is one of the standard-layout structs, itis permitted to inspect the common initial sequence of any of the standard-layout struct members; see 10.3.— end note]
Is reading myHeader.wId in the line packetIdFlipped = myHeader.wId << 8 undefined behaviour?
Is the unnamed struct the active member as it was the last member written to in the function call?
Or does the note mean it is safe to access the wId member, as it and the struct share a common type? (and is this what is meant by common initial sequence?)
Thanks in advance
The function BuildPacketHeader(uint8_t) assigns values to the members
wMake and wMod explicitly, and does not write to the member wId. My
question is regarding the safety of reading from the member wId inside
the returned instance of the structure.
Yes, it's UB. It does not mean it's not working, just that it may not work. You can use memcpy inside BuildPacketHeader to avoid that (see this and this).
Is reading myHeader.wId in the line packetIdFlipped = myHeader.wId << 8 undefined behaviour?
Yes. You assigned to wMake and wMod making the unamed struct the active member so wId is the inactive member and you are not allowed to read from it without setting a value to it.
and is this what is meant by common initial sequence?
The common initial sequence is when two standard layout types share the same members in the same order. In
struct foo
{
int a;
int b;
};
struct bar
{
int a;
int b;
int c;
};
a and b are of the same type in foo and bar so they are the common initial sequence of them. If you put objects of foo and bar in a union it would be safe to read a or b from wither object after it is set in one of them.
This is not your case though since wId isn't a standard layout type struct.
What the C++ standard kind of says is given two structs A and B and the following untion:
union U
{
A a;
B b;
};
The following is valid code:
U u;
A a;
u.a = a;
a = u.a;
B b;
u.b = b;
b = u.b;
You read and write the same type. This is obviously correct code.
But the problem comes when you have the following code:
A a;
B b;
u.a = a;
b = u.b;
What do we know about A and B? First in the union they share the same memory space. Now the C++ standard has plainly declared it as undefined behavior.
But that does not mean it's fully out of the window. C99 comes into play, since it is the normative base and there are weak guarantees about unions. That is if the union member have the same memory layout they are compatible and each structs first memory address is the same. So if you can ensure that your structs / union members are all padded in the correct way the operation is safe, even if C++ says it's undefined.
Finally from a pragmatic standpoint, if you don't mess with padding and get the standard layout, the compiler will generally do the right thing, since that is a quite old usage pattern in C and breaking this will break LOTS of code.

C/C++: uint8_t bitfields behave incorrectly and inconsistently

I have the following code:
struct Foo {
uint16_t a:5;
uint16_t b:5;
uint16_t c:5;
};
uint16_t bits = 0xdae;
const Foo& foo = *reinterpret_cast<const Foo*>(&bits);
printf("%x %x %x\n", foo.a, foo.b, foo.c);
The output is what I expect when I work it out on paper:
e d 3
If I use uint32_t instead of uint16_t bitfields, I get the same result.
But when I use uint8_t bitfields instead of uint16_t, I get an inconsistent result that is either:
e d 6
or
e d 16
Neither is correct. Why does using uint8_t for bitfields cause this strange behavior?
The compiler may add padding between structure members (or it may not, at its discretion) and you don't account for that when you just try to access the entire struct as a bunch of bits. Basically you can't just do that (the behaviour is undefined and your code is simply buggy). You need to access the structure members by name or use compiler specific extensions to control the layout/padding.
The memory layout of bit-fields is implementation defined. Confer, for example, this online draft c++ standard:
9.6 Bit-fields
(1) ... Allocation of bit-fields within a class object is
implementation-defined. Alignment of bit-fields is
implementation-defined.
Thus, as you do not introduce explicit padding by bit-fields of length 0, you have no control on how the compiler will lay out your struct in memory. Actually, I think that you yield undefined behaviour, as you are converting one pointer to another one with probably different alignment requirements, and accessing padding bits (as it would happen with the assignment of a reinterpreted cast) is undefined behaviour, too.
Note that you can control padding of bit fields in a standardised way:
(2) A declaration for a bit-field that omits the identifier declares an
unnamed bit-field. Unnamed bit-fields are not members and cannot be
initialized. [Note: An unnamed bit-field is useful for padding to
conform to externally-imposed layouts. — end note ] As a special case,
an unnamed bit-field with a width of zero specifies alignment of the
next bit-field at an allocation unit boundary. Only when declaring an
unnamed bit-field may the value of the constant-expression be equal to
zero.
See the following code that makes use of this feature; Note that it yields a different result as I found no way of how to control padding such that I could get consecutive 5-bit-piecies. So I adapted the example to something that works on the level of byte borders:
struct Foo {
uint8_t a:5;
uint8_t :0;
uint8_t b:5;
uint8_t :0;
uint8_t c:5;
};
union DifferentView {
uint32_t bits;
struct Foo foo;
};
int main()
{
union DifferentView myView;
myView.bits = 0x0d0a0e;
printf("%x %x %x\n", myView.foo.a, myView.foo.b, myView.foo.c);
// Output: e a d
return 0;
}

C struct elements alignment (ansi)

just a simple question... what the standard says about the structure members alignment?
for example with this one:
struct
{
uint8_t a;
uint8_t b;
/* other members */
} test;
It is guarateed that b is at offset 1 from the struct start?
Thanks
The standard (as of C99) doesn't really say anything.
The only real guarantees are that (void *)&test == (void *)&a, and that a is at a lower address than b. Everything else is up to the implementation.
C11 6.7.2.1 Structure and union specifiers p14 says
Each non-bit-field member of a structure or union object is aligned in
an implementation- defined manner appropriate to its type.
meaning that you can't make any portable assumptions about the difference between the addresses of a and b.
It should be possible to use offsetof to determine the offset of members.
For C the alignment is implementation defined, we can see that in the draft C99 standard section 6.7.2.1 Structure and union specifiers paragraph 12(In C11 it would be paragraph 14) which says:
Each non-bit-field member of a structure or union object is aligned in an implementation defined manner appropriate to its type.
and paragraph 13 says:
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
and for C++ we have the following similar quotes from the draft standard section 9.2 Class members paragraph 13 says:
Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified (Clause 11). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other;
and paragraph 19 says:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. —end note ]
the case you're using is not really an edge case, both uint_8 are small enough to fit in the same word in memory and it would be no use to put each uint_8 in a uint_16.
A more critical case would be something like :
{
uint8_t a;
uint8_t b;
uint_32 c; // where is C, at &a+2 or &a+4 ?
/* other members */
} test;
and anyway this will always depend on the target architecture and your compiler...
K&R second edition (ANSI C) in chapter 6.4 (page 138) says:
Don't assume, however, that the size of a structure is the sum of the sizes of its memebers. Because of alignment requirements for different objects, there may be unnamed "holes" in a structure.
So no, ANSI C does not guarantee that b is at offset 1.
It is even likely that the compiler puts b at offset sizeof(int) so that it's aligned on the size of a machine word, which is easier to deal with.
Some compilers support pack-pragmas so that you can force that there are no such "holes" in the struct, but that is not portable.
What is guaranteed by the C-Standard already had been mentioned by other answers.
However, to make sure b is at offset 1 your compiler might offer options to "pack" the structure, will say to explicitly add no padding.
For gcc this can be achieved by the #pragma pack().
#pragma pack(1)
struct
{
uint8_t a; /* Guaranteed to be at offset 0. */
uint8_t b; /* Guaranteed to be at offset 1. */
/* other members are guaranteed to start at offset 2. */
} test_packed;
#pragma pack()
struct
{
uint8_t a; /* Guaranteed to by at offset 0. */
uint8_t b; /* NOT guaranteed to be at offset 1. */
/* other members are NOT guaranteed to start at offset 2. */
} test_unpacked;
A portable (and save) solution would be to simply use an array:
struct
{
uint8_t ab[2]; /* ab[0] is guaranteed to be at offset 0. */
/* ab[1] is guaranteed to be at offset 1. */
/* other members are NOT guaranteed to start at offset 2. */
} test_packed;