Let's say that I have the following C++ code:
struct something
{
// ...
union { int size, length; };
// ...
};
This would create two members of the struct which access the same value: size and length.
Would treating the two members as complete aliases (i.e. setting the size, then accessing the length and vice/versa) be undefined behaviour? Is there a "better" way to implement this type of behaviour, or is this an acceptable implementation?
Yes, this is allowed and well-defined. According to §3.10 [basic.lval]:
10/ If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
— the dynamic type of the object
[...]
Since here we store an int and read through an int, we access the object through a glvalue of the same dynamic type than the object, thus things are fine.
There even is a special caveat in the Standard for structures that share the same prefix. Or, in standardese, standard-layout types that share a common initial sequence.
§9.2/18 If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them. Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.
That is:
struct A { unsigned size; char type; };
struct B { unsigned length; unsigned capacity; };
union { A a; B b; } x;
assert(x.a.size == x.b.length);
EDIT: Given that int is not a struct (nor a class) I am afraid it's actually not formally defined (I certainly could not see anything in the Standard), but should be safe in practice... I've brought the matters to the isocpp forums; you might have found a hole.
EDIT: Following the above mentionned discussion, I have been shown §3.10/10.
It is not undefined behavior. Both of the aliases in the union will be accessing the same location in the memory. See below:
§9.2/18 If a standard-layout union contains two or more
standard-layout structs that share a common initial sequence, and if
the standard-layout union object currently contains one of these
standard-layout structs, it is permitted to inspect the common initial
part of any of them. Two standard-layout structs share a common
initial sequence if corresponding members have layout-compatible types
and either neither member is a bit-field or both are bit-fields with
the same width for a sequence of one or more initial members.
It is undefined if types have different initial sequence.
Values will be same. If you assign 5 to size then length will also be 5.
Related
Similar to c union and bitfields but in C++ and including access to initial sequence
Similar to Union common initial sequence with primitive but using bitfields
What I want to do is this:
struct A
{
short common : 1;
short a1: 5;
short a2: 8;
};
struct B
{
short common : 1;
short b1: 3;
short b2: 4;
short b3: 6;
};
union C
{
A a;
B b;
};
Then use it as follows:
short foo(C data)
{
if (data.a.common)
{
return data.a.a1*data.a.a2;
}
return data.b.b1*data.b.b2*data.b.b3;
}
The issue seems to be that if data.a.common is false, the code will have accessed data set as a B via a member of A
There seems to be language in the standard about unions and similar initial sequences, but not sure common as a bitfield would count
9.5.1 it is permitted to inspect the common initial sequence of any of standard-layout struct members;
Any clarification would be appreciated
By the definition of common initial sequence:
The common initial sequence of two standard-layout struct ([class.prop]) types is the longest sequence of non-static data members and bit-fields in declaration order, starting with the first such entity in each of the structs, such that corresponding entities have layout-compatible types, either both entities are declared with the no_unique_address attribute ([dcl.attr.nouniqueaddr]) or neither is, and either both entities are bit-fields with the same width or neither is a bit-field.
the common initial sequence of struct A and struct B consists of the first bit-field common.
inside a definition like this
typedef struct
{
myType array[N];
} myStruct;
myStruct obj;
can I always assume that ([edit] assuming proper casting will happen which is not the focus of the question here [/edit])
(&obj == &obj.array[0])
will return TRUE or the should I worry about the compiler introducing extra padding to accomodate the myType alignment requisites? In theory this shouldn't happen as the struct has a single field but I'm not entirely sure about this.
With a proper cast, this will always return true.
From section 6.7.2.1 of the C standard:
13. Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase
in the order in which they are declared. A pointer to a structure
object, suitably converted, points to its initial member (or
if that member is a bit-field, then to the unit in which it
resides), and vice versa. There may be unnamed padding within a
structure object, but not at its beginning.
According to the current C++ standard draft [class.mem] §20 (N4527), emphasis added:
If a standard-layout class object has any non-static data members, its address is the same as the address
of its first non-static data member. Otherwise, its address is the same as the address of its first base class
subobject (if any). [ Note: There might therefore be unnamed padding within a standard-layout struct
object, but not at its beginning, as necessary to achieve appropriate alignment. — end note ]
Whether myStruct is standard-layout, depends on whether myType is standard-layout. If it is, then what you're asking is guaranteed by the c++ standard.
Note that &obj and &obj.array[0] have unrelated pointer types, so the expression is not legal in c++.
Sort of related to my previous question:
Do elements of arrays count as a common initial sequence?
struct arr4 { int arr[4]; };
struct arr2 { int arr[2]; };
union U
{
arr4 _arr4;
arr2 _arr2;
};
U u;
u._arr4.arr[0] = 0; //write to active
u._arr2.arr[0]; //read from inactive
According to this cppreference page:
In a standard-layout union with an active member of non-union class type T1, it is permitted to read a non-static data member m of another union member of non-union class type T2 provided m is part of the common initial sequence of T1 and T2....
Would this be legal, or would it also be illegal type punning?
C++11 says (9.2):
If a standard-layout union contains two or more standard-layout structs that share a common initial sequence,
and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted
to inspect the common initial part of any of them. Two standard-layout structs share a common initial
sequence if corresponding members have layout-compatible types and either neither member is a bit-field or
both are bit-fields with the same width for a sequence of one or more initial members.
As to whether arrays of different size form a valid common initial sequence, 3.9 says:
If two types T1 and T2 are the same type, then T1 and T2 are layout-compatible types
These arrays are not the same type, so this doesn't apply. There is no special further exception for arrays, so the arrays may not be layout-compatible and do not form a common initial sequence.
In practice, though, I know of a compiler (GCC) which:
ignores the "common initial sequence" rule, and
allows type punning anyway, but only when accesses are "via the union type" (as in your example), in which case the "common initial sequence" rule is obeyed indirectly (because a "common initial sequence" implies a common initial layout on the architectures the compiler supports).
I suspect many other compilers take a similar approach. In your example, where you type-pun via the union object, such compilers will give you the expected result - reading from the inactive member should give you value written via the inactive member.
The C Standard would allow an implementation to vary the placement of an array object within a structure based upon the number of elements. Among other things, there may be some circumstances where it may be useful to word-align a byte array which would occupy exactly one word, but not to word-align arrays of other sizes. For example, on a system with 8-bit char and 32-bit words, processing a structure such as:
struct foo {
char header;
char dat[4];
};
in a manner that word-aligns dat may allow an access to dat[i] to be processed by loading a word and shifting it right by a 0, 8, 16, or 24 bits, but such advantages might not be applicable had the structure instead been:
struct foo {
char header;
char dat[5];
};
The Standard was clearly not intended to forbid implementations from laying out structures in such ways, on platforms where doing so would be useful. On the other hand, when the Standard was written, compilers which would place arrays within a structure at offsets that were unaffected by the arrays' sizes would unanimously behave as though array elements that were present in two structures were part of the same Common Initial Sequence, and nothing in the published Rationale for the Standard suggests any intention to discourage such implementations from continuing to behave in such fashion. Code which relied upon such treatment would have been "non-portable", but correct on all implementations which followed common struct layout practices.
Given a POD-struct (in C++03) or a standard layout type (in C++11), with all members having a fundamental alignment requirement, is it true that every member is guaranteed to be aligned according to its alignment requirement?
In other words, for all members m_k in { m0 ... mn } of standard layout type S,
struct S {
T0 m0;
T1 m1;
...
TN mn;
};
is the following expression guaranteed to evaluate to true?
(offsetof(S,m_k) % alignof(decltype(S::m_k))) == 0
Please give answers for both C++03 and C++11 and cite the relevant parts of the standard. Supporting evidence from C standards would also be helpful.
My reading of the C++03 standard (ISO/IEC 14882:2003(E)) is that it is silent regarding the alignment of members within a POD-struct, except for the first member. The relevant paragraphs are:
In the language of the specification, an object is a "region of storage":
1.8 The C + + object model [intro.object]
1.8/1 The constructs in a C + + program create, destroy, refer to, access, and manipulate objects. An object is a region of storage. ...
Objects are allocated according to their alignment requirement:
3.9 Types [basic.types]
3.9/5 Object types have alignment requirements (3.9.1, 3.9.2). The alignment of a complete object type is an implementation-defined integer value representing a number of bytes; an object is allocated at an address that meets the alignment requirements of its object type.
Fundamental types have alignment requirements:
3.9.1 Fundamental types [basic.fundamental]
3.9.1/3 For each of the signed integer types, there exists a corresponding (but different) unsigned integer type: "unsigned char", "unsigned short int", "unsigned int", and "unsigned long int," each of which occupies the same amount of storage and has the same alignment requirements (3.9) as the corresponding signed integer type;...
Padding may occur due to "implementation alignment requirements":
9.2 Class members [class.mem]
9.2/12 Nonstatic data members of a (non-union) class declared without an intervening access-specifier are allocated so that later members have higher addresses within a class object. The order of allocation of nonstatic
data members separated by an access-specifier is unspecified (11.1). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions (10.3) and virtual base classes (10.1).
Does the word "allocated" in 9.2/12 have the same meaning as "allocated" in 3.9/5? Most uses of "allocated" in the spec refer to dynamic storage allocation, not struct-internal layout. The use of may in 9.2/12 seems to imply that the alignment requirements of 3.9/5 and 3.9.1/3 might not be strictly required for struct members.
The first member of a POD-struct will be aligned according to the alignment requirement of the struct:
9.2/17 A pointer to a POD-struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [Note: There might therefore be unnamed padding within a POD-struct object, but not at its beginning, as necessary to achieve appropriate alignment. ]
[Emphasis added in all of the above quotes.]
Each element of a POD struct is itself an object, and objects can only be allocated in accordance with the alignment requirements for those objects. The alignment requirements may change, though, due to the fact that something is a sub-object of another object ([basic.align]/1, 2:
1 Object types have alignment requirements (3.9.1, 3.9.2) which place restrictions on the addresses at which an object of that type may be allocated. An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated. An object type imposes an alignment requirement on every object of that type; stricter alignment can be requested using the alignment specifier (7.6.2).
2 A fundamental alignment is represented by an alignment less than or equal to the greatest alignment supported by the implementation in all contexts, which is equal to alignof(std::max_align_t) (18.2). The alignment required for a type might be different when it is used as the type of a complete object and when it is used as the type of a subobject. [Example:
struct B { long double d; };
struct D : virtual B { char c; }
When D is the type of a complete object, it will have a subobject of type B, so it must be aligned appropriately for a long double. If D appears as a subobject of another object that also has B as a virtual base class, the
B subobject might be part of a different subobject, reducing the alignment requirements on the D subobject.—end example ] The result of the alignof operator reflects the alignment requirement of the type in the
complete-object case.
[emphasis added]
Although the examples refer to a sub-object via inheritance, the normative wording just refers to sub-objects in general, so I believe the same rules apply, so on one hand you can assume that each sub-object is aligned so that it can be accessed. On the other hand, no you can't necessarily assume that will be the same alignment that alignof gives you.
[The reference is from N4296, but I believe the same applies to all recent versions. C++98/03, of course, didn't have alignof at all, but I believe the same basic principle applies--members will be aligned so they can be used, but that alignment requirement isn't necessarily the same as when they're used as independent objects.]
There are two distinct alignment requirements for each type. One corresponds to complete objects, the other one to subobjects. [basic.align]/2:
The alignment required for a type might be different when it is used
as the type of a complete object and when it is used as the type of a
subobject. [ Example:
struct B { long double d; };
struct D : virtual B { char c; };
When D is the type of a complete object, it will have a subobject of
type B, so it must be aligned appropriately for a long double. If
D appears as a subobject of another object that also has B as a
virtual base class, the B subobject might be part of a different
subobject, reducing the alignment requirements on the D subobject.
—end example ] The result of the alignof operator reflects the alignment requirement of the type in the complete-object case.
I can't think of any specific example for member subobjects - and there assuredly is none! -, but the above paragraph concedes the possibility that a member subobject has weaker alignment than alignof will yield, which would fail your condition.
I came across the initial sequence concept. Serching through the Standard for initial sequence phrase gives only 3 results and they don't give a definition.
Section N3797::9.5/1 [class.union]:
If a standard-layout union contains several standard-layout structs
that share a common initial sequence (9.2), and if an object of this
standard-layout union type contains one of the standard-layout
structs, it is permitted to inspect the common initial sequence of any
of standard-layout struct members;
I wish to look at an example which is demostrated that quote.
I believe it's talking about this kind of thing:
union U {
struct S {
int a;
int b;
int c;
}
struct T {
int x;
int y;
float f;
}
};
It's saying that it's OK to access either U.S.a or U.T.x and that they will be equivalent. Ditto for U.S.b and U.T.y of course. But accessing U.T.f after setting U.S.c would be undefined behaviour.