C++ Union Member Access And Undefined Behaviour - c++

I am currently working on a project in which I am provided the following
structre. My work is C++ but the project uses both C and C++. The same structure
definition is used by both C and C++.
typedef struct PacketHeader {
//Byte 0
uint8_t bRes :4;
uint8_t bEmpty :1;
uint8_t bWait :1;
uint8_t bErr :1;
uint8_t bEnable :1;
//Byte 1
uint8_t bInst :4;
uint8_t bCount :3;
uint8_t bRres :1;
//Bytes 2, 3
union {
uint16_t wId; /* Needed for Endian swapping */
struct{
uint16_t wMake :4;
uint16_t wMod :12;
};
};
} PacketHeader;
Depending on how instances of the structure are used, the required endianness of
the structure can be big or little endian. As the first two bytes of the
structure are each single bytes, these don't need altering when the endianness
changes.
Bytes 2 and 3, stored as a single uint16_t, are the only bytes which we need to
swap to acheive the desired endianness. To acheive the endianness swap, we have
been performing the following:
//Returns a constructed instance of PacketHeader with relevant fields set and the provided counter value
PacketHeader myHeader = mmt::BuildPacketHeader(count);
uint16_t packetIdFlipped;
//Swap positions of byte 2 and 3
packetIdFlipped = myHeader.wId << 8;
packetIdFlipped |= (uint16_t)myHeader.wId >> 8;
myHeader.wId = packetIdFlipped;
The function BuildPacketHeader(uint8_t) assigns values to the members wMake and
wMod explicitly, and does not write to the member wId. My question is regarding
the safety of reading from the member wId inside the returned instance of the
structure.
Questions such as
Accessing inactive union member and undefined behavior?,
Purpose of Unions in C and C++,
and Section 10.4 of the draft standard I have each mention the undefined behaviour arising from accessing an inactive member of a union in C++.
Paragraph 1 in Section 10.4 of the linked draft also contains the following note, though I'm not sure I understand all the terminology used:
[Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence (10.3), and if a non-static datamember of an object of this standard-layout union type is active and is one of the standard-layout structs, itis permitted to inspect the common initial sequence of any of the standard-layout struct members; see 10.3.— end note]
Is reading myHeader.wId in the line packetIdFlipped = myHeader.wId << 8 undefined behaviour?
Is the unnamed struct the active member as it was the last member written to in the function call?
Or does the note mean it is safe to access the wId member, as it and the struct share a common type? (and is this what is meant by common initial sequence?)
Thanks in advance

The function BuildPacketHeader(uint8_t) assigns values to the members
wMake and wMod explicitly, and does not write to the member wId. My
question is regarding the safety of reading from the member wId inside
the returned instance of the structure.
Yes, it's UB. It does not mean it's not working, just that it may not work. You can use memcpy inside BuildPacketHeader to avoid that (see this and this).

Is reading myHeader.wId in the line packetIdFlipped = myHeader.wId << 8 undefined behaviour?
Yes. You assigned to wMake and wMod making the unamed struct the active member so wId is the inactive member and you are not allowed to read from it without setting a value to it.
and is this what is meant by common initial sequence?
The common initial sequence is when two standard layout types share the same members in the same order. In
struct foo
{
int a;
int b;
};
struct bar
{
int a;
int b;
int c;
};
a and b are of the same type in foo and bar so they are the common initial sequence of them. If you put objects of foo and bar in a union it would be safe to read a or b from wither object after it is set in one of them.
This is not your case though since wId isn't a standard layout type struct.

What the C++ standard kind of says is given two structs A and B and the following untion:
union U
{
A a;
B b;
};
The following is valid code:
U u;
A a;
u.a = a;
a = u.a;
B b;
u.b = b;
b = u.b;
You read and write the same type. This is obviously correct code.
But the problem comes when you have the following code:
A a;
B b;
u.a = a;
b = u.b;
What do we know about A and B? First in the union they share the same memory space. Now the C++ standard has plainly declared it as undefined behavior.
But that does not mean it's fully out of the window. C99 comes into play, since it is the normative base and there are weak guarantees about unions. That is if the union member have the same memory layout they are compatible and each structs first memory address is the same. So if you can ensure that your structs / union members are all padded in the correct way the operation is safe, even if C++ says it's undefined.
Finally from a pragmatic standpoint, if you don't mess with padding and get the standard layout, the compiler will generally do the right thing, since that is a quite old usage pattern in C and breaking this will break LOTS of code.

Related

Is the address of the first data member of an instance the same as the address of the instance? [duplicate]

Can I assume that a C/C++ struct pointer will always point to the first member?
Example 1:
typedef struct {
unsigned char array_a[2];
unsigned char array_b[5];
}test;
//..
test var;
//..
In the above example will &var always point to array_a?
Also in the above example is it possible to cast the pointer
to an unsigned char pointer and access each byte separately?
Example 2:
function((unsigned char *)&var,sizeof(test));
//...
//...
void function(unsigned char *array, int len){
int i;
for( i=0; i<len; i++){
array[i]++;
}
}
Will that work correctly?
Note: I know that chars are byte aligned in a struct therefore I assume the size of the above struct is 7 bytes.
For C structs, yes, you can rely on it. This is how almost all "object orientated"-style APIs work in C (such as GObject and GTK).
For C++, you can rely on it only for "plain old data" (POD) types, which are guaranteed to be laid out in memory the same way as C structs. Exactly what constitutes a POD type is a little complicated and has changed between C++03 and C++11, but the crux of it is that if your type has any virtual functions then it's not a POD.
(In C++11 you can use std::is_pod to test at compile-time whether a struct is a POD type.)
EDIT: This tells you what constitutes a POD type in C++: http://en.cppreference.com/w/cpp/concept/PODType
EDIT2: Actually, in C++11, it doesn't need to be a POD, just "standard layout", which is a lightly weaker condition. Quoth section 9.2 [class.mem] paragraph 20 of the standard:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. — end note ]
From the C99 standard section 6.7.2.1 bullet point 13:
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at its
beginning.
The answer to your question is therefore yes.
Reference (see page 103)
The compiler is free to add padding and reorganize the struct how it sees fit. Especially in C++ you can add (virtual) functions and then chances are that the virtual table is hidden before that. But of course that are implementation details.
For C this assumption is valid.
For C, it's largely implementation-specific, but in practice the rule (in the absence of #pragma pack or something likewise) is:
Struct members are stored in the order they are declared. (This is required by the C99 standard, as mentioned here earlier.)
If necessary, padding is added before each struct member, to ensure correct alignment.
So given a struct like
struct test{
char ch;
int i;
}
will have ch at offset 0, then a padding byte to align, i at offset 2 and then at the end, padding bytes are added to make the struct size a multiple of 8 bytes.(on a 64-bit machine, 4 byte alignment may be permitted in 32 bit machines)
So at least in this case, for C, I think you can assume that the struct pointer will point to the first array.

Is accessing bitfield unions common initial data undefined behavior in C++ standards

Similar to c union and bitfields but in C++ and including access to initial sequence
Similar to Union common initial sequence with primitive but using bitfields
What I want to do is this:
struct A
{
short common : 1;
short a1: 5;
short a2: 8;
};
struct B
{
short common : 1;
short b1: 3;
short b2: 4;
short b3: 6;
};
union C
{
A a;
B b;
};
Then use it as follows:
short foo(C data)
{
if (data.a.common)
{
return data.a.a1*data.a.a2;
}
return data.b.b1*data.b.b2*data.b.b3;
}
The issue seems to be that if data.a.common is false, the code will have accessed data set as a B via a member of A
There seems to be language in the standard about unions and similar initial sequences, but not sure common as a bitfield would count
9.5.1 it is permitted to inspect the common initial sequence of any of standard-layout struct members;
Any clarification would be appreciated
By the definition of common initial sequence:
The common initial sequence of two standard-layout struct ([class.prop]) types is the longest sequence of non-static data members and bit-fields in declaration order, starting with the first such entity in each of the structs, such that corresponding entities have layout-compatible types, either both entities are declared with the no_­unique_­address attribute ([dcl.attr.nouniqueaddr]) or neither is, and either both entities are bit-fields with the same width or neither is a bit-field.
the common initial sequence of struct A and struct B consists of the first bit-field common.

C/C++: uint8_t bitfields behave incorrectly and inconsistently

I have the following code:
struct Foo {
uint16_t a:5;
uint16_t b:5;
uint16_t c:5;
};
uint16_t bits = 0xdae;
const Foo& foo = *reinterpret_cast<const Foo*>(&bits);
printf("%x %x %x\n", foo.a, foo.b, foo.c);
The output is what I expect when I work it out on paper:
e d 3
If I use uint32_t instead of uint16_t bitfields, I get the same result.
But when I use uint8_t bitfields instead of uint16_t, I get an inconsistent result that is either:
e d 6
or
e d 16
Neither is correct. Why does using uint8_t for bitfields cause this strange behavior?
The compiler may add padding between structure members (or it may not, at its discretion) and you don't account for that when you just try to access the entire struct as a bunch of bits. Basically you can't just do that (the behaviour is undefined and your code is simply buggy). You need to access the structure members by name or use compiler specific extensions to control the layout/padding.
The memory layout of bit-fields is implementation defined. Confer, for example, this online draft c++ standard:
9.6 Bit-fields
(1) ... Allocation of bit-fields within a class object is
implementation-defined. Alignment of bit-fields is
implementation-defined.
Thus, as you do not introduce explicit padding by bit-fields of length 0, you have no control on how the compiler will lay out your struct in memory. Actually, I think that you yield undefined behaviour, as you are converting one pointer to another one with probably different alignment requirements, and accessing padding bits (as it would happen with the assignment of a reinterpreted cast) is undefined behaviour, too.
Note that you can control padding of bit fields in a standardised way:
(2) A declaration for a bit-field that omits the identifier declares an
unnamed bit-field. Unnamed bit-fields are not members and cannot be
initialized. [Note: An unnamed bit-field is useful for padding to
conform to externally-imposed layouts. — end note ] As a special case,
an unnamed bit-field with a width of zero specifies alignment of the
next bit-field at an allocation unit boundary. Only when declaring an
unnamed bit-field may the value of the constant-expression be equal to
zero.
See the following code that makes use of this feature; Note that it yields a different result as I found no way of how to control padding such that I could get consecutive 5-bit-piecies. So I adapted the example to something that works on the level of byte borders:
struct Foo {
uint8_t a:5;
uint8_t :0;
uint8_t b:5;
uint8_t :0;
uint8_t c:5;
};
union DifferentView {
uint32_t bits;
struct Foo foo;
};
int main()
{
union DifferentView myView;
myView.bits = 0x0d0a0e;
printf("%x %x %x\n", myView.foo.a, myView.foo.b, myView.foo.c);
// Output: e a d
return 0;
}

C struct elements alignment (ansi)

just a simple question... what the standard says about the structure members alignment?
for example with this one:
struct
{
uint8_t a;
uint8_t b;
/* other members */
} test;
It is guarateed that b is at offset 1 from the struct start?
Thanks
The standard (as of C99) doesn't really say anything.
The only real guarantees are that (void *)&test == (void *)&a, and that a is at a lower address than b. Everything else is up to the implementation.
C11 6.7.2.1 Structure and union specifiers p14 says
Each non-bit-field member of a structure or union object is aligned in
an implementation- defined manner appropriate to its type.
meaning that you can't make any portable assumptions about the difference between the addresses of a and b.
It should be possible to use offsetof to determine the offset of members.
For C the alignment is implementation defined, we can see that in the draft C99 standard section 6.7.2.1 Structure and union specifiers paragraph 12(In C11 it would be paragraph 14) which says:
Each non-bit-field member of a structure or union object is aligned in an implementation defined manner appropriate to its type.
and paragraph 13 says:
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
and for C++ we have the following similar quotes from the draft standard section 9.2 Class members paragraph 13 says:
Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified (Clause 11). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other;
and paragraph 19 says:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. —end note ]
the case you're using is not really an edge case, both uint_8 are small enough to fit in the same word in memory and it would be no use to put each uint_8 in a uint_16.
A more critical case would be something like :
{
uint8_t a;
uint8_t b;
uint_32 c; // where is C, at &a+2 or &a+4 ?
/* other members */
} test;
and anyway this will always depend on the target architecture and your compiler...
K&R second edition (ANSI C) in chapter 6.4 (page 138) says:
Don't assume, however, that the size of a structure is the sum of the sizes of its memebers. Because of alignment requirements for different objects, there may be unnamed "holes" in a structure.
So no, ANSI C does not guarantee that b is at offset 1.
It is even likely that the compiler puts b at offset sizeof(int) so that it's aligned on the size of a machine word, which is easier to deal with.
Some compilers support pack-pragmas so that you can force that there are no such "holes" in the struct, but that is not portable.
What is guaranteed by the C-Standard already had been mentioned by other answers.
However, to make sure b is at offset 1 your compiler might offer options to "pack" the structure, will say to explicitly add no padding.
For gcc this can be achieved by the #pragma pack().
#pragma pack(1)
struct
{
uint8_t a; /* Guaranteed to be at offset 0. */
uint8_t b; /* Guaranteed to be at offset 1. */
/* other members are guaranteed to start at offset 2. */
} test_packed;
#pragma pack()
struct
{
uint8_t a; /* Guaranteed to by at offset 0. */
uint8_t b; /* NOT guaranteed to be at offset 1. */
/* other members are NOT guaranteed to start at offset 2. */
} test_unpacked;
A portable (and save) solution would be to simply use an array:
struct
{
uint8_t ab[2]; /* ab[0] is guaranteed to be at offset 0. */
/* ab[1] is guaranteed to be at offset 1. */
/* other members are NOT guaranteed to start at offset 2. */
} test_packed;

C/C++ Pointer to a POD struct also points to the 1st struct member

Can I assume that a C/C++ struct pointer will always point to the first member?
Example 1:
typedef struct {
unsigned char array_a[2];
unsigned char array_b[5];
}test;
//..
test var;
//..
In the above example will &var always point to array_a?
Also in the above example is it possible to cast the pointer
to an unsigned char pointer and access each byte separately?
Example 2:
function((unsigned char *)&var,sizeof(test));
//...
//...
void function(unsigned char *array, int len){
int i;
for( i=0; i<len; i++){
array[i]++;
}
}
Will that work correctly?
Note: I know that chars are byte aligned in a struct therefore I assume the size of the above struct is 7 bytes.
For C structs, yes, you can rely on it. This is how almost all "object orientated"-style APIs work in C (such as GObject and GTK).
For C++, you can rely on it only for "plain old data" (POD) types, which are guaranteed to be laid out in memory the same way as C structs. Exactly what constitutes a POD type is a little complicated and has changed between C++03 and C++11, but the crux of it is that if your type has any virtual functions then it's not a POD.
(In C++11 you can use std::is_pod to test at compile-time whether a struct is a POD type.)
EDIT: This tells you what constitutes a POD type in C++: http://en.cppreference.com/w/cpp/concept/PODType
EDIT2: Actually, in C++11, it doesn't need to be a POD, just "standard layout", which is a lightly weaker condition. Quoth section 9.2 [class.mem] paragraph 20 of the standard:
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its
initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note:
There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning,
as necessary to achieve appropriate alignment. — end note ]
From the C99 standard section 6.7.2.1 bullet point 13:
Within a structure object, the non-bit-field members and the units in
which bit-fields reside have addresses that increase in the order in
which they are declared. A pointer to a structure object, suitably
converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa.
There may be unnamed padding within a structure object, but not at its
beginning.
The answer to your question is therefore yes.
Reference (see page 103)
The compiler is free to add padding and reorganize the struct how it sees fit. Especially in C++ you can add (virtual) functions and then chances are that the virtual table is hidden before that. But of course that are implementation details.
For C this assumption is valid.
For C, it's largely implementation-specific, but in practice the rule (in the absence of #pragma pack or something likewise) is:
Struct members are stored in the order they are declared. (This is required by the C99 standard, as mentioned here earlier.)
If necessary, padding is added before each struct member, to ensure correct alignment.
So given a struct like
struct test{
char ch;
int i;
}
will have ch at offset 0, then a padding byte to align, i at offset 2 and then at the end, padding bytes are added to make the struct size a multiple of 8 bytes.(on a 64-bit machine, 4 byte alignment may be permitted in 32 bit machines)
So at least in this case, for C, I think you can assume that the struct pointer will point to the first array.