C/C++: uint8_t bitfields behave incorrectly and inconsistently - c++

I have the following code:
struct Foo {
uint16_t a:5;
uint16_t b:5;
uint16_t c:5;
};
uint16_t bits = 0xdae;
const Foo& foo = *reinterpret_cast<const Foo*>(&bits);
printf("%x %x %x\n", foo.a, foo.b, foo.c);
The output is what I expect when I work it out on paper:
e d 3
If I use uint32_t instead of uint16_t bitfields, I get the same result.
But when I use uint8_t bitfields instead of uint16_t, I get an inconsistent result that is either:
e d 6
or
e d 16
Neither is correct. Why does using uint8_t for bitfields cause this strange behavior?

The compiler may add padding between structure members (or it may not, at its discretion) and you don't account for that when you just try to access the entire struct as a bunch of bits. Basically you can't just do that (the behaviour is undefined and your code is simply buggy). You need to access the structure members by name or use compiler specific extensions to control the layout/padding.

The memory layout of bit-fields is implementation defined. Confer, for example, this online draft c++ standard:
9.6 Bit-fields
(1) ... Allocation of bit-fields within a class object is
implementation-defined. Alignment of bit-fields is
implementation-defined.
Thus, as you do not introduce explicit padding by bit-fields of length 0, you have no control on how the compiler will lay out your struct in memory. Actually, I think that you yield undefined behaviour, as you are converting one pointer to another one with probably different alignment requirements, and accessing padding bits (as it would happen with the assignment of a reinterpreted cast) is undefined behaviour, too.
Note that you can control padding of bit fields in a standardised way:
(2) A declaration for a bit-field that omits the identifier declares an
unnamed bit-field. Unnamed bit-fields are not members and cannot be
initialized. [Note: An unnamed bit-field is useful for padding to
conform to externally-imposed layouts. — end note ] As a special case,
an unnamed bit-field with a width of zero specifies alignment of the
next bit-field at an allocation unit boundary. Only when declaring an
unnamed bit-field may the value of the constant-expression be equal to
zero.
See the following code that makes use of this feature; Note that it yields a different result as I found no way of how to control padding such that I could get consecutive 5-bit-piecies. So I adapted the example to something that works on the level of byte borders:
struct Foo {
uint8_t a:5;
uint8_t :0;
uint8_t b:5;
uint8_t :0;
uint8_t c:5;
};
union DifferentView {
uint32_t bits;
struct Foo foo;
};
int main()
{
union DifferentView myView;
myView.bits = 0x0d0a0e;
printf("%x %x %x\n", myView.foo.a, myView.foo.b, myView.foo.c);
// Output: e a d
return 0;
}

Related

How do bit fields interplay with bits padding in C++

See the C version of this questions here.
I have two questions concerning bit fields when there are padding bits.
Say I have a struct defined as
struct T {
unsigned int x: 1;
unsigned int y: 1;
};
Struct T only has two bits actually used.
Question 1: are these two bits always the least significant bits of the underlying unsigned int? Or it is platform dependent?
Question 2: Are those unused 30 bits always initialized to 0? What does the C++ standard say about it?
Question 1: are these two bits always the least significant bits of the underlying unsigned int? Or it is platform dependent?
Very platform dependent. The standard even has a note just to clarify how much:
[class.bit]
1 ...Allocation of bit-fields within a class object is
implementation-defined. Alignment of bit-fields is
implementation-defined. Bit-fields are packed into some addressable
allocation unit. [ Note: Bit-fields straddle allocation units on some
machines and not on others. Bit-fields are assigned right-to-left on
some machines, left-to-right on others.  — end note ]
You can't assume much of anything about the object layout of a bit field.
Question 2: Are those unused 30 bits always initialized to 0? What does the C++ standard say about it?
Your example has a simple aggregate, so we can enumerate the possible initializations. Specifying no initializer...
T t;
... will default initialize it, leaving the members with indeterminate value. On the other hand, if you specify empty braces...
T t{};
... the object will be aggregate initialized, and so the bit fields will be initialized with {} themselves, and set to zero. But that applies only to the members of the aggregate, which are the bit fields. It's not specified what value, if any, the padding bits take. So we cannot assume they will be initialized to zero.
Q1: Usually from low to hi (i.e. x is 1 << 0, y is 1 << 1, etc).
Q2: The value of the unused bits is undefined. On some compilers/platforms, stack initialised variables might be set to zero first (might!!), but don't count on it!! Heap allocated variables could be anything, so it's best to assume the bits are garbage. Using a slightly non standard anonymous struct buried in a union, you could do something like this to ensure the value of the bits:
union T {
unsigned intval;
struct {
unsigned x : 1;
unsigned y : 1;
};
};
T foo;
foo.intval = 0;

C++ Union Member Access And Undefined Behaviour

I am currently working on a project in which I am provided the following
structre. My work is C++ but the project uses both C and C++. The same structure
definition is used by both C and C++.
typedef struct PacketHeader {
//Byte 0
uint8_t bRes :4;
uint8_t bEmpty :1;
uint8_t bWait :1;
uint8_t bErr :1;
uint8_t bEnable :1;
//Byte 1
uint8_t bInst :4;
uint8_t bCount :3;
uint8_t bRres :1;
//Bytes 2, 3
union {
uint16_t wId; /* Needed for Endian swapping */
struct{
uint16_t wMake :4;
uint16_t wMod :12;
};
};
} PacketHeader;
Depending on how instances of the structure are used, the required endianness of
the structure can be big or little endian. As the first two bytes of the
structure are each single bytes, these don't need altering when the endianness
changes.
Bytes 2 and 3, stored as a single uint16_t, are the only bytes which we need to
swap to acheive the desired endianness. To acheive the endianness swap, we have
been performing the following:
//Returns a constructed instance of PacketHeader with relevant fields set and the provided counter value
PacketHeader myHeader = mmt::BuildPacketHeader(count);
uint16_t packetIdFlipped;
//Swap positions of byte 2 and 3
packetIdFlipped = myHeader.wId << 8;
packetIdFlipped |= (uint16_t)myHeader.wId >> 8;
myHeader.wId = packetIdFlipped;
The function BuildPacketHeader(uint8_t) assigns values to the members wMake and
wMod explicitly, and does not write to the member wId. My question is regarding
the safety of reading from the member wId inside the returned instance of the
structure.
Questions such as
Accessing inactive union member and undefined behavior?,
Purpose of Unions in C and C++,
and Section 10.4 of the draft standard I have each mention the undefined behaviour arising from accessing an inactive member of a union in C++.
Paragraph 1 in Section 10.4 of the linked draft also contains the following note, though I'm not sure I understand all the terminology used:
[Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence (10.3), and if a non-static datamember of an object of this standard-layout union type is active and is one of the standard-layout structs, itis permitted to inspect the common initial sequence of any of the standard-layout struct members; see 10.3.— end note]
Is reading myHeader.wId in the line packetIdFlipped = myHeader.wId << 8 undefined behaviour?
Is the unnamed struct the active member as it was the last member written to in the function call?
Or does the note mean it is safe to access the wId member, as it and the struct share a common type? (and is this what is meant by common initial sequence?)
Thanks in advance
The function BuildPacketHeader(uint8_t) assigns values to the members
wMake and wMod explicitly, and does not write to the member wId. My
question is regarding the safety of reading from the member wId inside
the returned instance of the structure.
Yes, it's UB. It does not mean it's not working, just that it may not work. You can use memcpy inside BuildPacketHeader to avoid that (see this and this).
Is reading myHeader.wId in the line packetIdFlipped = myHeader.wId << 8 undefined behaviour?
Yes. You assigned to wMake and wMod making the unamed struct the active member so wId is the inactive member and you are not allowed to read from it without setting a value to it.
and is this what is meant by common initial sequence?
The common initial sequence is when two standard layout types share the same members in the same order. In
struct foo
{
int a;
int b;
};
struct bar
{
int a;
int b;
int c;
};
a and b are of the same type in foo and bar so they are the common initial sequence of them. If you put objects of foo and bar in a union it would be safe to read a or b from wither object after it is set in one of them.
This is not your case though since wId isn't a standard layout type struct.
What the C++ standard kind of says is given two structs A and B and the following untion:
union U
{
A a;
B b;
};
The following is valid code:
U u;
A a;
u.a = a;
a = u.a;
B b;
u.b = b;
b = u.b;
You read and write the same type. This is obviously correct code.
But the problem comes when you have the following code:
A a;
B b;
u.a = a;
b = u.b;
What do we know about A and B? First in the union they share the same memory space. Now the C++ standard has plainly declared it as undefined behavior.
But that does not mean it's fully out of the window. C99 comes into play, since it is the normative base and there are weak guarantees about unions. That is if the union member have the same memory layout they are compatible and each structs first memory address is the same. So if you can ensure that your structs / union members are all padded in the correct way the operation is safe, even if C++ says it's undefined.
Finally from a pragmatic standpoint, if you don't mess with padding and get the standard layout, the compiler will generally do the right thing, since that is a quite old usage pattern in C and breaking this will break LOTS of code.

C++ Union/Struct Bitfield Implementation and Portability

I have a union containing a uint16 and a struct like so:
union pData {
uint16_t w1;
struct {
uint8_t d1 : 8;
uint8_t d2 : 4;
bool b1 : 1;
bool b2 : 1;
bool b3 : 1;
bool b4 : 1;
} bits;
};
My colleague says there is an issue with this being portable, but I am not sure I buy this. Could some please explain (simple as possible) what is "wrong" here?
From C++17 12.2.4 Bit-fields /1 (and C++11 9.6 Bit-fields /1 for that matter, if you want the answer specific to your chosen tags):
Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit. [Note: Bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. - end note]
Reliance on implementation defined behaviour means, by its very nature, non-portable code.
Maybe your colleague guesses that you intend to write to w1 and read from bits or vice versa.
That would be undefined behaviour. In C++ only one member of a union can be active at any one time; writing a member makes it active, and the behaviour of reading an inactive member is undefined.

C struct elements alignment (ansi)

just a simple question... what the standard says about the structure members alignment?
for example with this one:
struct
{
uint8_t a;
uint8_t b;
/* other members */
} test;
It is guarateed that b is at offset 1 from the struct start?
Thanks
The standard (as of C99) doesn't really say anything.
The only real guarantees are that (void *)&test == (void *)&a, and that a is at a lower address than b. Everything else is up to the implementation.
C11 6.7.2.1 Structure and union specifiers p14 says
Each non-bit-field member of a structure or union object is aligned in
an implementation- defined manner appropriate to its type.
meaning that you can't make any portable assumptions about the difference between the addresses of a and b.
It should be possible to use offsetof to determine the offset of members.
For C the alignment is implementation defined, we can see that in the draft C99 standard section 6.7.2.1 Structure and union specifiers paragraph 12(In C11 it would be paragraph 14) which says:
Each non-bit-field member of a structure or union object is aligned in an implementation defined manner appropriate to its type.
and paragraph 13 says:
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
and for C++ we have the following similar quotes from the draft standard section 9.2 Class members paragraph 13 says:
Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified (Clause 11). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other;
and paragraph 19 says:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. —end note ]
the case you're using is not really an edge case, both uint_8 are small enough to fit in the same word in memory and it would be no use to put each uint_8 in a uint_16.
A more critical case would be something like :
{
uint8_t a;
uint8_t b;
uint_32 c; // where is C, at &a+2 or &a+4 ?
/* other members */
} test;
and anyway this will always depend on the target architecture and your compiler...
K&R second edition (ANSI C) in chapter 6.4 (page 138) says:
Don't assume, however, that the size of a structure is the sum of the sizes of its memebers. Because of alignment requirements for different objects, there may be unnamed "holes" in a structure.
So no, ANSI C does not guarantee that b is at offset 1.
It is even likely that the compiler puts b at offset sizeof(int) so that it's aligned on the size of a machine word, which is easier to deal with.
Some compilers support pack-pragmas so that you can force that there are no such "holes" in the struct, but that is not portable.
What is guaranteed by the C-Standard already had been mentioned by other answers.
However, to make sure b is at offset 1 your compiler might offer options to "pack" the structure, will say to explicitly add no padding.
For gcc this can be achieved by the #pragma pack().
#pragma pack(1)
struct
{
uint8_t a; /* Guaranteed to be at offset 0. */
uint8_t b; /* Guaranteed to be at offset 1. */
/* other members are guaranteed to start at offset 2. */
} test_packed;
#pragma pack()
struct
{
uint8_t a; /* Guaranteed to by at offset 0. */
uint8_t b; /* NOT guaranteed to be at offset 1. */
/* other members are NOT guaranteed to start at offset 2. */
} test_unpacked;
A portable (and save) solution would be to simply use an array:
struct
{
uint8_t ab[2]; /* ab[0] is guaranteed to be at offset 0. */
/* ab[1] is guaranteed to be at offset 1. */
/* other members are NOT guaranteed to start at offset 2. */
} test_packed;

C/C++ Pointer to a POD struct also points to the 1st struct member

Can I assume that a C/C++ struct pointer will always point to the first member?
Example 1:
typedef struct {
unsigned char array_a[2];
unsigned char array_b[5];
}test;
//..
test var;
//..
In the above example will &var always point to array_a?
Also in the above example is it possible to cast the pointer
to an unsigned char pointer and access each byte separately?
Example 2:
function((unsigned char *)&var,sizeof(test));
//...
//...
void function(unsigned char *array, int len){
int i;
for( i=0; i<len; i++){
array[i]++;
}
}
Will that work correctly?
Note: I know that chars are byte aligned in a struct therefore I assume the size of the above struct is 7 bytes.
For C structs, yes, you can rely on it. This is how almost all "object orientated"-style APIs work in C (such as GObject and GTK).
For C++, you can rely on it only for "plain old data" (POD) types, which are guaranteed to be laid out in memory the same way as C structs. Exactly what constitutes a POD type is a little complicated and has changed between C++03 and C++11, but the crux of it is that if your type has any virtual functions then it's not a POD.
(In C++11 you can use std::is_pod to test at compile-time whether a struct is a POD type.)
EDIT: This tells you what constitutes a POD type in C++: http://en.cppreference.com/w/cpp/concept/PODType
EDIT2: Actually, in C++11, it doesn't need to be a POD, just "standard layout", which is a lightly weaker condition. Quoth section 9.2 [class.mem] paragraph 20 of the standard:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. — end note ]
From the C99 standard section 6.7.2.1 bullet point 13:
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at its
beginning.
The answer to your question is therefore yes.
Reference (see page 103)
The compiler is free to add padding and reorganize the struct how it sees fit. Especially in C++ you can add (virtual) functions and then chances are that the virtual table is hidden before that. But of course that are implementation details.
For C this assumption is valid.
For C, it's largely implementation-specific, but in practice the rule (in the absence of #pragma pack or something likewise) is:
Struct members are stored in the order they are declared. (This is required by the C99 standard, as mentioned here earlier.)
If necessary, padding is added before each struct member, to ensure correct alignment.
So given a struct like
struct test{
char ch;
int i;
}
will have ch at offset 0, then a padding byte to align, i at offset 2 and then at the end, padding bytes are added to make the struct size a multiple of 8 bytes.(on a 64-bit machine, 4 byte alignment may be permitted in 32 bit machines)
So at least in this case, for C, I think you can assume that the struct pointer will point to the first array.