As the (Working Draft of) C++ Standard says:
9.5.1 [class.union]
In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [...] The size of a union is sufficient to contain the largest of its non-static data members. Each non-static data member is allocated as if it were the sole member of a struct. All non-static data members of a union object have the same address.
But I don't know how to identify which is the active member of an union and I'm not used enough to dive into the standard to locate what the standard says about it, I've tried to figure how the active member is setted but I've found how it is swapped:
9.5.4 [class.union]
[ Note: In general, one must use explicit destructor calls and placement new operators to change the active member of a union. —end note ] [Example: Consider an object u of a union type U having non-static data members m of type M and n of type N. If M has a non-trivial destructor and N has a non-trivial constructor (for instance, if they declare or inherit virtual functions), the active member of u can be safely switched from m to n using the destructor and placement new operator as follows:
u.m.~M();
new (&u.n) N;
—end example ]
So my guess is that the active member of an union is the one first asigned, used, constructed or placement-new'ed; but this becomes kind of tricky with uniform initialization, consider the following code:
union Foo
{
struct {char a,b,c,d;};
char array[4];
int integer;
};
Foo f; // default ctor
std::cout << f.a << f.b << f.c << f.d << '\n';
Which is the active member of the union on the code above? Is std::cout reading from the active member of the union? What about the code below?
Foo f{0,1,2,3}; // uniform initialization
std::cout << f.a << f.b << f.c << f.d << '\n';
With the lines above we can initialize the nested anonymous struct or either the array, if I provide only an integer I can initialize Foo::a or Foo::array or Foo::integer... which one would be the active member?
Foo f{0}; // uniform initialization
std::cout << f.integer << '\n';
I guess that the active member would be the aninymous struct in all of the above cases but I'm not sure.
If I want to activate one or the other union member, should I provide a constructor activating it?
union Bar
{
// #1 Activate anonymous struct
Bar(char x, char y, char z, char t) : a(x),b(y),c(z),d(t) {}
// #2 Activate array
Bar(char (&a)[4]) { std::copy(std::begin(a), std::end(a), std::begin(array)); }
// #3 Activate integer
Bar(int i) : integer(i) {}
struct {char a,b,c,d;};
char array[4];
int integer;
};
I'm almost sure that #1 and #3 will mark as active union the anonymous struct and the integer but I don't know about the #2 because in the moment we reach the body of the constructor the members are already constructed! so are we calling std::copy over an inactive union member?
Questions:
Which are the active union members of Foo if it is constructed with the following uniform initialization:
Foo{};
Foo{1,2,3,4};
Foo{1};
In the #2 constructor of Bar the Bar::array is the active union member?
Where in the standard can I read about which is exactly the active union member and how to set it without placement new?
Your concern about the lack of a rigorous definition of the active member of a union is shared by (at least some of) the members of the standardization committee - see the latest note (dated May 2015) in the description of active issue 1116:
We never say what the active member of a union is, how it can be changed, and so on. [...]
I think we can expect some sort of clarification in future versions of the working draft. That note also indicates that the best we have so far is the note in the paragraph you quoted in your question, [9.5p4].
That being said, let's look at your other questions.
First of all, there are no anonymous structs in standard C++ (only anonymous unions); struct {char a,b,c,d;}; will give you warnings if compiled with reasonably strict options (-std=c++1z -Wall -Wextra -pedantic for Clang and GCC, for example). Going forward, I'll assume we have a declaration like struct { char a, b, c, d; } s; and everything else is adjusted accordingly.
The implicitly defaulted default constructor in your first example doesn't perform any initialization according to [12.6.2p9.2]:
In a non-delegating constructor, if a given potentially constructed
subobject is not designated by a mem-initializer-id (including the
case where there is no mem-initializer-list because the constructor
has no ctor-initializer), then
(9.1) - if the entity is a non-static data member that has a brace-or-equal-initializer and either
(9.1.1) - the constructor’s class is a union (9.5), and no other variant member of that union is designated by a mem-initializer-id or
(9.1.2) - the constructor’s class is not a union, and, if the entity is a member of an anonymous union, no other member of that union is designated by a mem-initializer-id,
the entity is initialized as specified in 8.5;
(9.2) - otherwise, if the entity is an anonymous union or a variant member (9.5), no initialization is performed;
(9.3) - otherwise, the entity is default-initialized (8.5).
I suppose we could say that f has no active member after its default constructor has finished executing, but I don't know of any standard wording that clearly indicates that. What can be said in practice is that it makes no sense to attempt to read the value of any of f's members, since they're indeterminate.
In your next example, you're using aggregate initialization, which is reasonably well-defined for unions according to [8.5.1p16]:
When a union is initialized with a brace-enclosed initializer, the
braces shall only contain an initializer-clause for the first
non-static data member of the union. [ Example:
union u { int a; const char* b; };
u a = { 1 };
u b = a;
u c = 1; // error
u d = { 0, "asdf" }; // error
u e = { "asdf" }; // error
— end example ]
That, together with brace elision for the initialization of the nested struct, as specified in [8.5.1p12], makes the struct the active member. It answers your next question as well: you can only initialize the first union member using that syntax.
Your next question:
If I want to activate one or the other union member, should I provide a constructor activating it?
Yes, or a brace-or-equal-initializer for exactly one member according to [12.6.2p9.1.1] quoted above; something like this:
union Foo
{
struct { char a, b, c, d; } s;
char array[4];
int integer = 7;
};
Foo f;
After the above, the active member will be integer. All of the above should also answer your question about #2 (the members are not already constructed when we reach the body of the constructor - #2 is fine as well).
Wrapping up, both Foo{} and Foo{1} perform aggregate initialization; they're interpreted as Foo{{}} and Foo{{1}}, respectively, (because of brace elision), and initialize the struct; the first one sets all the struct members to 0 and the second one sets the first member to 1 and the rest to 0, according to [8.5.1p7].
All standard quotes are from the current working draft, N4527.
Paper N4430, which deals with somewhat related issues, but hasn't been integrated into the working draft yet, provides a definition for active member:
In a union, a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended ([basic.life]).
This effectively passes the buck to the definition of lifetime in [3.8], which also has a few issues open against it, including the aforementioned issue 1116, so I think we'll have to wait for several such issues to be resolved in order to have a complete and consistent definition. The definition of lifetime as it currently stands doesn't seem to be quite ready.
The active member is the last member you wrote to. Simple as that.
The term is not defined by C++ because it is defined by English.
Related
Consider the code:
struct Foo
{
const char str[] = "test";
};
int main()
{
Foo foo;
}
It fails to compile with both g++ and clang++, spitting out essentially
error: array bound cannot be deduced from an in-class initializer
I understand that this is what the standard probably says, but is there any particular good reason why? Since we have a string literal it seems that the compiler should be able to deduce the size without any problem, similarly to the case when you simply declare an out-of-class const C-like null terminated string.
The reason is that you always have the possibility to override an in-class initializer list in the constructor. So I guess that in the end, it could be very confusing.
struct Foo
{
Foo() {} // str = "test\0";
// Implementing this is easier if I can clearly see how big `str` is,
Foo() : str({'a','b', 'c', 'd'}) {} // str = "abcd0"
const char str[] = "test";
};
Notice that replacing const char with static constexpr char works perfectly, and probably it is what you want anyway.
As mentioned in the comments and as answered by #sbabbi, the answer lies in the details
12.6.2 Initializing bases and members [class.base.init]
In a non-delegating constructor, if a given non-static data member or
base class is not designated by a mem-initializer-id (including the
case where there is no mem-initializer-list because the constructor
has no ctor-initializer) and the entity is not a virtual base class of
an abstract class (10.4), then
if the entity is a non-static data member that has a brace-or-equal-initializer , the entity is initialized as specified in
8.5;
otherwise, if the entity is an anonymous union or a variant member (9.5), no initialization is performed;
otherwise, the entity is default-initialized
12.6.2 Initializing bases and members [class.base.init]
If a given non-static data member has both a
brace-or-equal-initializer and a mem-initializer, the initialization
specified by the mem-initializer is performed, and the non-static data
member’s brace-or-equal-initializer is ignored. [ Example: Given
struct A {
int i = /∗ some integer expression with side effects ∗/ ;
A(int arg) : i(arg) { }
// ...
};
the A(int) constructor will simply initialize i to the value of arg,
and the side effects in i’s brace-or equal-initializer will not take
place. — end example ]
So, if there is a non-deleting constructor, the brace-or-equal-initializer is ignored, and the constructor in-member initialization prevails. Thus, for array members for which the size is omitted, the expression becomes ill-formed. §12.6.2, item 9, makes it more explicit where we it specified that the r-value initializer expression is omitted if mem-initialization is performed by the constructor.
Also, the google group dicussion Yet another inconsitent behavior in C++, further elaborates and makes it more lucid. It extends the idea in explaining that brace-or-equal-initializer is a glorified way of an in-member initialization for cases where the in-member initialization for the member does not exist. As an example
struct Foo {
int i[5] ={1,2,3,4,5};
int j;
Foo(): j(0) {};
}
is equivalent to
struct Foo {
int i[5];
int j;
Foo(): j(0), i{1,2,3,4,5} {};
}
but now we see that if the array size was omitted, the expression would be ill-formed.
But then saying that, the compiler could have supported the feature for cases when the member is not initialized by in-member constructor initialization but currently for the sake of uniformity, the standard like many other things, does not support this feature.
If the compiler was allowed to support what you described, and the size of str was deduced to 5,
Foo foo = {{"This is not a test"}};
will lead to undefined behavior.
If I have a union with two data members of the same type, differing only by CV-qualification:
template<typename T>
union A
{
private:
T x_priv;
public:
const T x_publ;
public:
// Accept-all constructor
template<typename... Args>
A(Args&&... args) : x_priv(args...) {}
// Destructor
~A() { x_priv.~T(); }
};
And I have a function f that declares a union A, thus making x_priv the active member and then reads x_publ from that union:
int f()
{
A<int> a {7};
return a.x_publ;
}
In every compiler I tested there were no errors compiling nor at runtime for both int types and other, more complex, types such as std::string and std::thread.
I went to see on the standard if this was legal behavior and I started on looking at the difference of T and const T:
6.7.3.1 [basic.type.qualifier]
The cv-qualified or cv-unqualified versions of a type are distinct types; however, they shall have the same representation and alignment requirements ([basic.align]).
This means that when declaring a const T it has the exact same representation in memory as a T. But then I found that the standard actually disallows this for some types, which I found weird, as I see no reason for it.
I started my search on accessing non-active members.
It is only legal to access the common initial sequence of T and const T if both are standard-layout types.
10.4.1[class.union]
At most one of the non-static data members of an object of union type can be active at any time [...] [ Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence ([class.mem]), and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members; see [class.mem]. — end note ]
The initial sequence is basically the order of the non-static data members with a few exceptions, but since T and const T have the exact same members in the same layout, this means that the common initial sequence of T and const T is all of the members of T.
10.3.22 [class.mem]
The common initial sequence of two standard-layout struct ([class.prop]) types is the longest sequence of non-static data members and bit-fields in declaration order, starting with the first such entity in each of the structs, such that corresponding entities have layout-compatible types, either both entities are declared with the no_unique_address attribute ([dcl.attr.nouniqueaddr]) or neither is, and either both entities are bit-fields with the same width or neither is a bit-field. [ Example:
And here is where the restrictions come in, it restricts some types from being accessed, even though they have the exact same representation in memory:
10.1.3 [class.prop]
A class S is a standard-layout class if it:
(3.1) has no non-static data members of type non-standard-layout class (or array of such types) or reference,
(3.2) has no virtual functions and no virtual base classes,
(3.3) has the same access control for all non-static data members,
(3.4) has no non-standard-layout base classes,
(3.5) has at most one base class subobject of any given type,
(3.6) has all non-static data members and bit-fields in the class and its base classes first declared in the same class, and
(3.7) has no element of the set M(S) of types as a base class, where for any type X, M(X) is defined as follows.108 [ Note: M(X) is the set of the types of all non-base-class subobjects that may be at a zero offset in X. — end note ]
(3.7.1) If X is a non-union class type with no (possibly inherited) non-static data members, the set M(X) is empty.
(3.7.2) If X is a non-union class type with a non-static data member of type X_0 that is either of zero size or is the first non-static data member of X (where said member may be an anonymous union), the set M(X) consists of X_0 and the elements of M(X_0).
(3.7.3) If X is a union type, the set M(X) is the union of all M(U_i) and the set containing all U_i, where each U_i is the type of the ith non-static data member of X.
(3.7.4) If X is an array type with element type X_e , the set M(X) consists of X e and the elements of M (X_e).
(3.7.5) If X is a non-class, non-array type, the set M(X) is empty.
My questions is is there any reason for this to not be valid behavior?.
Essentially is it that:
The standard makers forgot to account for this particular case?
I haven't read some part of the standard that allows this behavior?
There's some more specific reason for this not to be valid behavior?
A reason for this to be valid syntax is, for example, having a 'readonly' variable in a class, as such:
struct B;
struct A
{
... // Everything that struct A had before
friend B;
}
struct B
{
A member;
void f() { member.x_priv = 100; }
}
int main()
{
B b;
b.f(); // Modifies the value of member.x_priv
//b.member.x_priv = 100; // Invalid, x_priv is private
int x = b.member.x_publ; // Fine, x_publ is public
}
This way you don't need a getter function, which can cause performance overhead and although most compiler would optimize that away it still increases your class, and to get the variable you'd have to write int x = b.get_x().
Nor would you need a const reference to that variable (as described in this question), which while it works great, it adds size to your class, which can be bad for sufficiently big classes or classes that need to be as small as possible.
And it is weird having to write b.member.x_priv instead of b.x_priv but this would be fixable if we could have private members in anonymous unions then we could rewrite it like this:
struct B
{
union
{
private:
int x_priv;
public:
int x_publ;
friend B;
};
void f() { x_priv = 100; }
}
int main()
{
B b;
b.f(); // Modifies the value of member.x_priv
//b.x_priv = 100; // Invalid, x_priv is private
int x = b.x_publ; // Fine, x_publ is public
}
Another use case might be to give various names to the same data member, lie for example in a Shape, the user might want to refer to the position as either shape.pos, shape.position, shape.cur_pos or shape.shape_pos.
Although this would probably create more problems than it is worth, such a use case might be favorable when for example a name should be deprecated .
Code like this:
struct A { int i; };
struct B { int j; };
union U {
struct A a;
struct B b;
};
int main() {
union U u;
u.a.i = 1;
printf("%d\n", u.b.j);
}
is valid in C. For the sake of backward compatibility, it was considered desirable to ensure that it is also valid in C++. The special rules about common initial sequences of standard-layout structs ensure this backward compatibility. Extending the rule to allow more cases to be well-defined—ones involving non-standard-layout structs—is not necessary for C compatibility, since all structs that can be defined in the common subset of C and C++ are automatically standard-layout structs in C++.
Actually, the C++ rules are a little bit more permissive than required for C compatibility. They allow some cases involving base classes too:
struct A { int i; };
struct B { int j; };
struct C : A { };
struct D : B { };
// C and D have a common initial sequence consisting of C::i and D::j
But in general, structs in C++ can be much more complicated than their C counterparts. They can, for example, have virtual functions and virtual base classes, and those can affect their layout in an implementation-defined manner. For this reason, it's not so easy to make more cases of type punning through unions well-defined in C++. You would really have to sit down with implementers and discuss what the conditions would be such that the committee should mandate that two classes have the same layout for their common initial sequence and not leave it up to the implementation. Currently, that mandate applies only to standard-layout classes.
There are various rules in the standard that are strong enough to imply that T and const T always have the exact same layout even if T is not a standard-layout class. For this reason, it would be possible to make certain forms of type punning between a T member and a const T member of a union well-defined even if T is not standard-layout. However, adding only this very special case to the language is of dubious value and I think it's unlikely that the committee would accept such a proposal unless you have a really compelling use case. Not wanting to provide a getter that returns a const reference, simply because you don't want to write the () to call the getter each time you need access, is unlikely to convince the committee.
I'm creating a class in which one member is a const pointer (immutable address) to another member of the struct.
In the simplified version below, will both classes always behave the same value? Especially in the sense of whether the addresses stored in ptr are guaranteed to be properly initialized.
struct First
{
int a;
int* const ptr = &a;
};
struct Second
{
int a;
int* const ptr;
Second() : ptr(&a) {}
};
(In my actual application the member a is a class instance, and ptr is replaced by a map from some enums to pointers pointing to members of a.)
In the simplified version below, will both structs always behave the same way?
No they won't, but it may be ok for your case. Read on.
Both First::ptr and Second::ptr will be initialized to the expected value being the address of First::a and respectively Second::a, but:
[class.mem]/7 & [class.mem]/9
7 In a member-declarator, an = immediately following the declarator is interpreted as introducing a pure-specifier
if the declarator-id has function type, otherwise it is interpreted as introducing a brace-or-equal-initializer.
9 A brace-or-equal-initializer shall appear only in the declaration of a data member. (For static data members,
see 12.2.3.2; for non-static data members, see 15.6.2 and 11.6.1). A brace-or-equal-initializer for a non-static
data member specifies a default member initializer for the member, and shall not directly or indirectly cause
the implicit definition of a defaulted default constructor for the enclosing class or the exception specification
of that constructor.
This means, First has a defaulted default constructor where Second has a user-provided default constructor, which change some characteristic of those classes. I can for instance think of aggregates, triviality and maybe standard layouts.
Consider the code:
struct Foo
{
const char str[] = "test";
};
int main()
{
Foo foo;
}
It fails to compile with both g++ and clang++, spitting out essentially
error: array bound cannot be deduced from an in-class initializer
I understand that this is what the standard probably says, but is there any particular good reason why? Since we have a string literal it seems that the compiler should be able to deduce the size without any problem, similarly to the case when you simply declare an out-of-class const C-like null terminated string.
The reason is that you always have the possibility to override an in-class initializer list in the constructor. So I guess that in the end, it could be very confusing.
struct Foo
{
Foo() {} // str = "test\0";
// Implementing this is easier if I can clearly see how big `str` is,
Foo() : str({'a','b', 'c', 'd'}) {} // str = "abcd0"
const char str[] = "test";
};
Notice that replacing const char with static constexpr char works perfectly, and probably it is what you want anyway.
As mentioned in the comments and as answered by #sbabbi, the answer lies in the details
12.6.2 Initializing bases and members [class.base.init]
In a non-delegating constructor, if a given non-static data member or
base class is not designated by a mem-initializer-id (including the
case where there is no mem-initializer-list because the constructor
has no ctor-initializer) and the entity is not a virtual base class of
an abstract class (10.4), then
if the entity is a non-static data member that has a brace-or-equal-initializer , the entity is initialized as specified in
8.5;
otherwise, if the entity is an anonymous union or a variant member (9.5), no initialization is performed;
otherwise, the entity is default-initialized
12.6.2 Initializing bases and members [class.base.init]
If a given non-static data member has both a
brace-or-equal-initializer and a mem-initializer, the initialization
specified by the mem-initializer is performed, and the non-static data
member’s brace-or-equal-initializer is ignored. [ Example: Given
struct A {
int i = /∗ some integer expression with side effects ∗/ ;
A(int arg) : i(arg) { }
// ...
};
the A(int) constructor will simply initialize i to the value of arg,
and the side effects in i’s brace-or equal-initializer will not take
place. — end example ]
So, if there is a non-deleting constructor, the brace-or-equal-initializer is ignored, and the constructor in-member initialization prevails. Thus, for array members for which the size is omitted, the expression becomes ill-formed. §12.6.2, item 9, makes it more explicit where we it specified that the r-value initializer expression is omitted if mem-initialization is performed by the constructor.
Also, the google group dicussion Yet another inconsitent behavior in C++, further elaborates and makes it more lucid. It extends the idea in explaining that brace-or-equal-initializer is a glorified way of an in-member initialization for cases where the in-member initialization for the member does not exist. As an example
struct Foo {
int i[5] ={1,2,3,4,5};
int j;
Foo(): j(0) {};
}
is equivalent to
struct Foo {
int i[5];
int j;
Foo(): j(0), i{1,2,3,4,5} {};
}
but now we see that if the array size was omitted, the expression would be ill-formed.
But then saying that, the compiler could have supported the feature for cases when the member is not initialized by in-member constructor initialization but currently for the sake of uniformity, the standard like many other things, does not support this feature.
If the compiler was allowed to support what you described, and the size of str was deduced to 5,
Foo foo = {{"This is not a test"}};
will lead to undefined behavior.
Suppose I have the following struct:
struct sampleData
{
int x;
int y;
};
And when used, I want to initialize variables of sampleData type to a known state.
sampleData sample = { 1, 2 }
Later, I decide that I need additional data stored in my sampleData struct, as follows:
struct sampleData
{
int x;
int y;
int z;
};
It is my understanding that the two field initialization left over from my pre-z data structure is still a valid statement, and will be compiled., populating the missing fields with default values.
Is this understanding correct? I have been working recently in Ada, which also allows aggregate initialization, but which would flag a similar issue as a compilation error. Assuming that my assumptions about the C++ code above are correct, is there a language construct which would recognize missing initialization values as an error?
Initialising variables that way is only supported with Aggregate Classes.
If you add constructor(s) then then problem goes away, but you'll need to change the syntax a little and you lose the ability to store the struct in a union (among other things).
struct sampleData
{
sampleData(int x, int y) : x(x), y(y) {}
int x;
int y;
};
sampleData sample( 1, 2 );
Adding z (and changing the constructor) will mark sample( 1, 2 ) as a compile error.
Yes, any elements you leave off of the initialization list will be initialized to zero (for POD scalar types) or using their default constructor (for classes).
The relevant language from the C standard is quoted here:
[6.7.8.21] If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
I am sure someone more motivated than I could find the corresponding language in one of the C++ specs...
Note that this implies that POD scalar elements are initialized as if you wrote "= 0". Which means it will correctly initialize pointers to NULL and floats to 0.0 even if their representations do not happen to be all-zero bytes. It also implies that it works recursively; if your struct contains a struct, the inner struct will be properly initialized as well.
As a followup to Nemo's answer with the C standardese, here is what the C++03 standard says:
§8.5.1/7:
If there are fewer initializers in the list than there are members in the aggregate, then each member not explicitly initialized shall be value-initialized.
§8.5/5:
To value-initialize an object of type T means:
if T is a class type with a user-declared constructor, then the default constructor for T is called (and the initialization is ill-formed if T has no accessible default constructor);
if T is a non-union class type without a user-declared constructor, then every non-static data member and base-class component of T is value-initialized;
if T is an array type, then each element is value-initialized;
otherwise, the object is zero-initialized
To zero-initialize an object of type T means:
if T is a scalar type, the object is set to the value of 0 (zero) converted to T;
if T is a non-union class type, each nonstatic data member and each base-class subobject is zero-initialized;
if T is a union type, the object’s first named data member) is zero-initialized;
if T is an array type, each element is zero-initialized;
if T is a reference type, no initialization is performed.
Why not use
sampleData sample = { x: 1, y:2 } ;
?
But you'd still run into the problem of z being initialized to an unpredictable value, so it's better to define a constructor which sets all variables to well defined values.