I came across the initial sequence concept. Serching through the Standard for initial sequence phrase gives only 3 results and they don't give a definition.
Section N3797::9.5/1 [class.union]:
If a standard-layout union contains several standard-layout structs
that share a common initial sequence (9.2), and if an object of this
standard-layout union type contains one of the standard-layout
structs, it is permitted to inspect the common initial sequence of any
of standard-layout struct members;
I wish to look at an example which is demostrated that quote.
I believe it's talking about this kind of thing:
union U {
struct S {
int a;
int b;
int c;
}
struct T {
int x;
int y;
float f;
}
};
It's saying that it's OK to access either U.S.a or U.T.x and that they will be equivalent. Ditto for U.S.b and U.T.y of course. But accessing U.T.f after setting U.S.c would be undefined behaviour.
Related
Below is excerpted from cppref but reduced to demo:
#include <iostream>
struct Empty {}; // empty class
struct W
{
char c[2];
[[no_unique_address]] Empty e1, e2;
};
int main()
{
std::cout << std::boolalpha;
// e1 and e2 cannot have the same address, but one of them can share with
// c[0] and the other with c[1]
std::cout << "sizeof(W) == 2 is " << (sizeof(W) == 2) << '\n';
}
The documentation says the output might be:
sizeof(W) == 2 is true
However, both gcc and clang output as follows:
sizeof(W) == 2 is false
Is it implementation-defined that how to deal with [[no_unique_address]]?
See [intro.object]/8:
An object has nonzero size if ... Otherwise, if the object is a base class subobject of a standard-layout class type with no non-static data members, it has zero size.
Otherwise, the circumstances under which the object has zero size are implementation-defined.
Empty base class optimization became mandatory for standard-layout classes in C++11 (see here for discussion). Empty member optimization is never mandatory. It is implementation-defined, as you suspected.
Yes, virtually all aspects of no_unique_address are implementation-defined. It is a tool to allow for optimizations, not to enforce them.
That being said, you should never assume that no_unique_address will work when you attempt to have two subobjects with the same type. The standard still requires that all distinct subobjects of the same type have different addresses, no_unique_address or not. And while it is possible that the compiler could assign these empty subobjects distinct addresses by radically reordering the members... they're pretty much not going to do that.
Your best bet for reasonably taking advantage of no_unique_address optimizations is to never have two subobjects of the same type, and try to put all possibly empty members first. That is, you should expect implementations to assign an empty no_unique_address member to the offset of the next member (or to the offset of the containing struct as a whole).
Similar to c union and bitfields but in C++ and including access to initial sequence
Similar to Union common initial sequence with primitive but using bitfields
What I want to do is this:
struct A
{
short common : 1;
short a1: 5;
short a2: 8;
};
struct B
{
short common : 1;
short b1: 3;
short b2: 4;
short b3: 6;
};
union C
{
A a;
B b;
};
Then use it as follows:
short foo(C data)
{
if (data.a.common)
{
return data.a.a1*data.a.a2;
}
return data.b.b1*data.b.b2*data.b.b3;
}
The issue seems to be that if data.a.common is false, the code will have accessed data set as a B via a member of A
There seems to be language in the standard about unions and similar initial sequences, but not sure common as a bitfield would count
9.5.1 it is permitted to inspect the common initial sequence of any of standard-layout struct members;
Any clarification would be appreciated
By the definition of common initial sequence:
The common initial sequence of two standard-layout struct ([class.prop]) types is the longest sequence of non-static data members and bit-fields in declaration order, starting with the first such entity in each of the structs, such that corresponding entities have layout-compatible types, either both entities are declared with the no_unique_address attribute ([dcl.attr.nouniqueaddr]) or neither is, and either both entities are bit-fields with the same width or neither is a bit-field.
the common initial sequence of struct A and struct B consists of the first bit-field common.
I am currently working on a project in which I am provided the following
structre. My work is C++ but the project uses both C and C++. The same structure
definition is used by both C and C++.
typedef struct PacketHeader {
//Byte 0
uint8_t bRes :4;
uint8_t bEmpty :1;
uint8_t bWait :1;
uint8_t bErr :1;
uint8_t bEnable :1;
//Byte 1
uint8_t bInst :4;
uint8_t bCount :3;
uint8_t bRres :1;
//Bytes 2, 3
union {
uint16_t wId; /* Needed for Endian swapping */
struct{
uint16_t wMake :4;
uint16_t wMod :12;
};
};
} PacketHeader;
Depending on how instances of the structure are used, the required endianness of
the structure can be big or little endian. As the first two bytes of the
structure are each single bytes, these don't need altering when the endianness
changes.
Bytes 2 and 3, stored as a single uint16_t, are the only bytes which we need to
swap to acheive the desired endianness. To acheive the endianness swap, we have
been performing the following:
//Returns a constructed instance of PacketHeader with relevant fields set and the provided counter value
PacketHeader myHeader = mmt::BuildPacketHeader(count);
uint16_t packetIdFlipped;
//Swap positions of byte 2 and 3
packetIdFlipped = myHeader.wId << 8;
packetIdFlipped |= (uint16_t)myHeader.wId >> 8;
myHeader.wId = packetIdFlipped;
The function BuildPacketHeader(uint8_t) assigns values to the members wMake and
wMod explicitly, and does not write to the member wId. My question is regarding
the safety of reading from the member wId inside the returned instance of the
structure.
Questions such as
Accessing inactive union member and undefined behavior?,
Purpose of Unions in C and C++,
and Section 10.4 of the draft standard I have each mention the undefined behaviour arising from accessing an inactive member of a union in C++.
Paragraph 1 in Section 10.4 of the linked draft also contains the following note, though I'm not sure I understand all the terminology used:
[Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence (10.3), and if a non-static datamember of an object of this standard-layout union type is active and is one of the standard-layout structs, itis permitted to inspect the common initial sequence of any of the standard-layout struct members; see 10.3.— end note]
Is reading myHeader.wId in the line packetIdFlipped = myHeader.wId << 8 undefined behaviour?
Is the unnamed struct the active member as it was the last member written to in the function call?
Or does the note mean it is safe to access the wId member, as it and the struct share a common type? (and is this what is meant by common initial sequence?)
Thanks in advance
The function BuildPacketHeader(uint8_t) assigns values to the members
wMake and wMod explicitly, and does not write to the member wId. My
question is regarding the safety of reading from the member wId inside
the returned instance of the structure.
Yes, it's UB. It does not mean it's not working, just that it may not work. You can use memcpy inside BuildPacketHeader to avoid that (see this and this).
Is reading myHeader.wId in the line packetIdFlipped = myHeader.wId << 8 undefined behaviour?
Yes. You assigned to wMake and wMod making the unamed struct the active member so wId is the inactive member and you are not allowed to read from it without setting a value to it.
and is this what is meant by common initial sequence?
The common initial sequence is when two standard layout types share the same members in the same order. In
struct foo
{
int a;
int b;
};
struct bar
{
int a;
int b;
int c;
};
a and b are of the same type in foo and bar so they are the common initial sequence of them. If you put objects of foo and bar in a union it would be safe to read a or b from wither object after it is set in one of them.
This is not your case though since wId isn't a standard layout type struct.
What the C++ standard kind of says is given two structs A and B and the following untion:
union U
{
A a;
B b;
};
The following is valid code:
U u;
A a;
u.a = a;
a = u.a;
B b;
u.b = b;
b = u.b;
You read and write the same type. This is obviously correct code.
But the problem comes when you have the following code:
A a;
B b;
u.a = a;
b = u.b;
What do we know about A and B? First in the union they share the same memory space. Now the C++ standard has plainly declared it as undefined behavior.
But that does not mean it's fully out of the window. C99 comes into play, since it is the normative base and there are weak guarantees about unions. That is if the union member have the same memory layout they are compatible and each structs first memory address is the same. So if you can ensure that your structs / union members are all padded in the correct way the operation is safe, even if C++ says it's undefined.
Finally from a pragmatic standpoint, if you don't mess with padding and get the standard layout, the compiler will generally do the right thing, since that is a quite old usage pattern in C and breaking this will break LOTS of code.
Sort of related to my previous question:
Do elements of arrays count as a common initial sequence?
struct arr4 { int arr[4]; };
struct arr2 { int arr[2]; };
union U
{
arr4 _arr4;
arr2 _arr2;
};
U u;
u._arr4.arr[0] = 0; //write to active
u._arr2.arr[0]; //read from inactive
According to this cppreference page:
In a standard-layout union with an active member of non-union class type T1, it is permitted to read a non-static data member m of another union member of non-union class type T2 provided m is part of the common initial sequence of T1 and T2....
Would this be legal, or would it also be illegal type punning?
C++11 says (9.2):
If a standard-layout union contains two or more standard-layout structs that share a common initial sequence,
and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted
to inspect the common initial part of any of them. Two standard-layout structs share a common initial
sequence if corresponding members have layout-compatible types and either neither member is a bit-field or
both are bit-fields with the same width for a sequence of one or more initial members.
As to whether arrays of different size form a valid common initial sequence, 3.9 says:
If two types T1 and T2 are the same type, then T1 and T2 are layout-compatible types
These arrays are not the same type, so this doesn't apply. There is no special further exception for arrays, so the arrays may not be layout-compatible and do not form a common initial sequence.
In practice, though, I know of a compiler (GCC) which:
ignores the "common initial sequence" rule, and
allows type punning anyway, but only when accesses are "via the union type" (as in your example), in which case the "common initial sequence" rule is obeyed indirectly (because a "common initial sequence" implies a common initial layout on the architectures the compiler supports).
I suspect many other compilers take a similar approach. In your example, where you type-pun via the union object, such compilers will give you the expected result - reading from the inactive member should give you value written via the inactive member.
The C Standard would allow an implementation to vary the placement of an array object within a structure based upon the number of elements. Among other things, there may be some circumstances where it may be useful to word-align a byte array which would occupy exactly one word, but not to word-align arrays of other sizes. For example, on a system with 8-bit char and 32-bit words, processing a structure such as:
struct foo {
char header;
char dat[4];
};
in a manner that word-aligns dat may allow an access to dat[i] to be processed by loading a word and shifting it right by a 0, 8, 16, or 24 bits, but such advantages might not be applicable had the structure instead been:
struct foo {
char header;
char dat[5];
};
The Standard was clearly not intended to forbid implementations from laying out structures in such ways, on platforms where doing so would be useful. On the other hand, when the Standard was written, compilers which would place arrays within a structure at offsets that were unaffected by the arrays' sizes would unanimously behave as though array elements that were present in two structures were part of the same Common Initial Sequence, and nothing in the published Rationale for the Standard suggests any intention to discourage such implementations from continuing to behave in such fashion. Code which relied upon such treatment would have been "non-portable", but correct on all implementations which followed common struct layout practices.
Let's say that I have the following C++ code:
struct something
{
// ...
union { int size, length; };
// ...
};
This would create two members of the struct which access the same value: size and length.
Would treating the two members as complete aliases (i.e. setting the size, then accessing the length and vice/versa) be undefined behaviour? Is there a "better" way to implement this type of behaviour, or is this an acceptable implementation?
Yes, this is allowed and well-defined. According to §3.10 [basic.lval]:
10/ If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
— the dynamic type of the object
[...]
Since here we store an int and read through an int, we access the object through a glvalue of the same dynamic type than the object, thus things are fine.
There even is a special caveat in the Standard for structures that share the same prefix. Or, in standardese, standard-layout types that share a common initial sequence.
§9.2/18 If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them. Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.
That is:
struct A { unsigned size; char type; };
struct B { unsigned length; unsigned capacity; };
union { A a; B b; } x;
assert(x.a.size == x.b.length);
EDIT: Given that int is not a struct (nor a class) I am afraid it's actually not formally defined (I certainly could not see anything in the Standard), but should be safe in practice... I've brought the matters to the isocpp forums; you might have found a hole.
EDIT: Following the above mentionned discussion, I have been shown §3.10/10.
It is not undefined behavior. Both of the aliases in the union will be accessing the same location in the memory. See below:
§9.2/18 If a standard-layout union contains two or more
standard-layout structs that share a common initial sequence, and if
the standard-layout union object currently contains one of these
standard-layout structs, it is permitted to inspect the common initial
part of any of them. Two standard-layout structs share a common
initial sequence if corresponding members have layout-compatible types
and either neither member is a bit-field or both are bit-fields with
the same width for a sequence of one or more initial members.
It is undefined if types have different initial sequence.
Values will be same. If you assign 5 to size then length will also be 5.