Why compiler doesn't allow std::string inside union? - c++

i want to use string inside Union.
if i write as below
union U
{
int i;
float f;
string s;
};
Compiler gives error saying U::S has copy constructor.
I read some other post for alternate ways for solving this issue.
But i want to know why compiler doesn't allow this in the first place?
EDIT: #KennyTM: In any union, if member is initialized others will have garbage values, if none is initialized all will have garbage values. I think, tagged union just provides some comfort to access valid values from Union.
Your question: how do you or the compiler write a copy constructor for the union above without extra information?
sizeof(string) gives 4 bytes. Based on this, compiler can compare other members sizes and allocate largest allocation(4bytes in our example). Internal string length doesn't matter because it will be stored in a seperate location. Let the string be of any length. All that Union has to know is invoking string class copy constructor with string parameter. In whichever way compiler finds that copy constructor has to be invoked in normal case, similar method as to be followed even when string is inside Union. So i am thinking compiler could do like, allocate 4 bytes. Then if any string is assigned to s, then string class will take care of allocation and copying of that string using its own allocator. So there is no chance of memory corruption as well.
Is string not existed at the time of Union developement in compiler ?
So the answer is not clear to me still.
Am a new joinee in this site, if anything wrong, pls excuse me.

Because having a class with a non-trivial (copy/)constructor in a union doesn't make sense. Suppose we have
union U {
string x;
vector<int> y;
};
U u; // <--
If U was a struct, u.x and u.y would be initialized to an empty string and empty vector respectively. But members of a union share the same address. So, if u.x is initialized, u.y will contain invalid data, and so is the reverse. If both of them are not initialized then they cannot be used. In any case, having these data in a union cannot be handled easily, so C++98 chooses to deny this: (§9.5/1):
An object of a class with a non-trivial constructor (12.1), a non-trivial copy constructor (12.8), a non-trivial destructor (12.4), or a non-trivial copy assignment operator (13.5.3, 12.8) cannot be a member of a union, nor can an array of such objects.
In C++0x this rule has been relaxed (§9.5/2):
At most one non-static data member of a union may have a brace-or-equal-initializer. [Note: if any non-static data member of a union has a non-trivial default constructor (12.1), copy constructor (12.8), move constructor (12.8), copy assignment operator (12.8), move
assignment operator (12.8), or destructor (12.4), the corresponding member function of the union must be user-provided or it will be implicitly deleted (8.4.3) for the union. — end note ]
but it is still a not possible to create (correct) con/destructors for the union, e.g. how do you or the compiler write a copy constructor for the union above without extra information? To ensure which member of the union is active, you need a tagged union, and you need to handle the construction and destruction manually e.g.
struct TU {
int type;
union {
int i;
float f;
std::string s;
} u;
TU(const TU& tu) : type(tu.type) {
switch (tu.type) {
case TU_STRING: new(&u.s)(tu.u.s); break;
case TU_INT: u.i = tu.u.i; break;
case TU_FLOAT: u.f = tu.u.f; break;
}
}
~TU() {
if (tu.type == TU_STRING)
u.s.~string();
}
...
};
But, as #DeadMG has mentioned, this is already implemented as boost::variant or boost::any.

Think about it. How does the compiler know what type is in the union?
It doesn't. The fundamental operation of a union is essentially a bitwise cast. Operations on values contained within unions are only safe when each type can essentially be filled with garbage. std::string can't, because that would result in memory corruption. Use boost::variant or boost::any.

In C++98/03, members of a union can't have constructors, destructors, virtual member functions, or base classes.
So basically, you can only use built-in data types, or PODs
Note that it is changing in C++0x: Unrestricted unions
union {
int z;
double w;
string s; // Illegal in C++98, legal in C++0x.
};

From the C++ spec §9.5.1:
An object of a class with a non-trivial constructor, a non-trivial copy constructor, a non-trivial destructor, or a non-trivial copy assignment operator cannot be a member of a union.
The reason for this rule is that the compiler will never know which of the destructors/constructors call, since it never really knows which of the possible objects is inside the union.

The garbage is introduced if you
assign a string
then assign an int or float
then a string again
string manages memory somewhere else. This information is most likely some pointer. This pointer is garbaged when assigning the int. Assigning a new string should destroy the old string, which is not possible.
The second step should destroy the string, but does not know, if there has been a string.
They obviously have found a solution for this problem in the meantime.

You can now do it.
Of course if you initialize any other member of the union first, or simply don't initialize the string at all, then there's a problem.
Since the string class overloads the assignment operator, you can't then initialize the string with an assignment operation:
this->union_string = std::string("whatever");
Will fail because you're still using the assignment operator.
To properly initialize a union string after you've put something else in the union or not initialized it in the first place, you have to call the constructor directly on that memory:
new(&this->union_string) std::string("whatever");
This way you're simply not using the assignment function at all.
Another concern is your compiler should make you make a destructor, and if for some reason not, you should make it anyway. Since it's a union, by the end of your class's lifetime the compiler can't know whether that union memory is used by the string or something else, so your destructor should call the string's destructor if that's the case.
So if you don't do it, you'll have a memory leak since the constructor for the string is never called, and it never knows to release the memory it's using.

In new C++ standard (I tested it in C++17), you can use a complex type as a member of union.
struct ustring
{
union
{
string s;
wstring ws;
};
bool bAscii = true;
~ustring()
{
if (bAscii)
{
s.~string();
}
else
{
ws.~wstring();
}
}
};
However, you should be very careful. Think about you construct s but destruct ws.

Related

copy constructor of 'Car' is implicitly deleted because variant field 'fuelType' has a non-trivial copy constructor [duplicate]

i want to use string inside Union.
if i write as below
union U
{
int i;
float f;
string s;
};
Compiler gives error saying U::S has copy constructor.
I read some other post for alternate ways for solving this issue.
But i want to know why compiler doesn't allow this in the first place?
EDIT: #KennyTM: In any union, if member is initialized others will have garbage values, if none is initialized all will have garbage values. I think, tagged union just provides some comfort to access valid values from Union.
Your question: how do you or the compiler write a copy constructor for the union above without extra information?
sizeof(string) gives 4 bytes. Based on this, compiler can compare other members sizes and allocate largest allocation(4bytes in our example). Internal string length doesn't matter because it will be stored in a seperate location. Let the string be of any length. All that Union has to know is invoking string class copy constructor with string parameter. In whichever way compiler finds that copy constructor has to be invoked in normal case, similar method as to be followed even when string is inside Union. So i am thinking compiler could do like, allocate 4 bytes. Then if any string is assigned to s, then string class will take care of allocation and copying of that string using its own allocator. So there is no chance of memory corruption as well.
Is string not existed at the time of Union developement in compiler ?
So the answer is not clear to me still.
Am a new joinee in this site, if anything wrong, pls excuse me.
Because having a class with a non-trivial (copy/)constructor in a union doesn't make sense. Suppose we have
union U {
string x;
vector<int> y;
};
U u; // <--
If U was a struct, u.x and u.y would be initialized to an empty string and empty vector respectively. But members of a union share the same address. So, if u.x is initialized, u.y will contain invalid data, and so is the reverse. If both of them are not initialized then they cannot be used. In any case, having these data in a union cannot be handled easily, so C++98 chooses to deny this: (§9.5/1):
An object of a class with a non-trivial constructor (12.1), a non-trivial copy constructor (12.8), a non-trivial destructor (12.4), or a non-trivial copy assignment operator (13.5.3, 12.8) cannot be a member of a union, nor can an array of such objects.
In C++0x this rule has been relaxed (§9.5/2):
At most one non-static data member of a union may have a brace-or-equal-initializer. [Note: if any non-static data member of a union has a non-trivial default constructor (12.1), copy constructor (12.8), move constructor (12.8), copy assignment operator (12.8), move
assignment operator (12.8), or destructor (12.4), the corresponding member function of the union must be user-provided or it will be implicitly deleted (8.4.3) for the union. — end note ]
but it is still a not possible to create (correct) con/destructors for the union, e.g. how do you or the compiler write a copy constructor for the union above without extra information? To ensure which member of the union is active, you need a tagged union, and you need to handle the construction and destruction manually e.g.
struct TU {
int type;
union {
int i;
float f;
std::string s;
} u;
TU(const TU& tu) : type(tu.type) {
switch (tu.type) {
case TU_STRING: new(&u.s)(tu.u.s); break;
case TU_INT: u.i = tu.u.i; break;
case TU_FLOAT: u.f = tu.u.f; break;
}
}
~TU() {
if (tu.type == TU_STRING)
u.s.~string();
}
...
};
But, as #DeadMG has mentioned, this is already implemented as boost::variant or boost::any.
Think about it. How does the compiler know what type is in the union?
It doesn't. The fundamental operation of a union is essentially a bitwise cast. Operations on values contained within unions are only safe when each type can essentially be filled with garbage. std::string can't, because that would result in memory corruption. Use boost::variant or boost::any.
In C++98/03, members of a union can't have constructors, destructors, virtual member functions, or base classes.
So basically, you can only use built-in data types, or PODs
Note that it is changing in C++0x: Unrestricted unions
union {
int z;
double w;
string s; // Illegal in C++98, legal in C++0x.
};
From the C++ spec §9.5.1:
An object of a class with a non-trivial constructor, a non-trivial copy constructor, a non-trivial destructor, or a non-trivial copy assignment operator cannot be a member of a union.
The reason for this rule is that the compiler will never know which of the destructors/constructors call, since it never really knows which of the possible objects is inside the union.
The garbage is introduced if you
assign a string
then assign an int or float
then a string again
string manages memory somewhere else. This information is most likely some pointer. This pointer is garbaged when assigning the int. Assigning a new string should destroy the old string, which is not possible.
The second step should destroy the string, but does not know, if there has been a string.
They obviously have found a solution for this problem in the meantime.
You can now do it.
Of course if you initialize any other member of the union first, or simply don't initialize the string at all, then there's a problem.
Since the string class overloads the assignment operator, you can't then initialize the string with an assignment operation:
this->union_string = std::string("whatever");
Will fail because you're still using the assignment operator.
To properly initialize a union string after you've put something else in the union or not initialized it in the first place, you have to call the constructor directly on that memory:
new(&this->union_string) std::string("whatever");
This way you're simply not using the assignment function at all.
Another concern is your compiler should make you make a destructor, and if for some reason not, you should make it anyway. Since it's a union, by the end of your class's lifetime the compiler can't know whether that union memory is used by the string or something else, so your destructor should call the string's destructor if that's the case.
So if you don't do it, you'll have a memory leak since the constructor for the string is never called, and it never knows to release the memory it's using.
In new C++ standard (I tested it in C++17), you can use a complex type as a member of union.
struct ustring
{
union
{
string s;
wstring ws;
};
bool bAscii = true;
~ustring()
{
if (bAscii)
{
s.~string();
}
else
{
ws.~wstring();
}
}
};
However, you should be very careful. Think about you construct s but destruct ws.

Is a constructor implicitly declared in the case of a abstract class with no data members? [duplicate]

In the book I'm reading at the moment (C++ Without Fear) it says that if you don't declare a default constructor for a class, the compiler supplies one for you, which "zeroes out each data member". I've experimented with this, and I'm not seeing any zeroing -out behaviour. I also can't find anything that mentions this on Google. Is this just an error or a quirk of a specific compiler?
If you do not define a constructor, the compiler will define a default constructor for you.
Construction
The implementation of this
default constructor is:
default construct the base class (if the base class does not have a default constructor, this is a compilation failure)
default construct each member variable in the order of declaration. (If a member does not have a default constructor, this is a compilation failure).
Note:
The POD data (int,float,pointer, etc.) do not have an explicit constructor but the default action is to do nothing (in the vane of C++ philosophy; we do not want to pay for something unless we explicitly ask for it).
Copy
If no destructor/copy Constructor/Copy Assignment operator is defined the compiler builds one of those for you (so a class always has a destructor/Copy Constructor/Assignment Operator (unless you cheat and explicitly declare one but don't define it)).
The default implementation is:
Destructor:
If user-defined destructor is defined, execute the code provided.
Call the destructor of each member in reverse order of declaration
Call the destructor of the base class.
Copy Constructor:
Call the Base class Copy Constructor.
Call the copy constructor for each member variable in the order of declaration.
Copy Assignment Operator:
Call the base class assignment operator
Call the copy assignment operator of each member variable in the order of declaration.
Return a reference to this.
Note Copy Construction/Assignment operator of POD Data is just copying the data (Hence the shallow copy problem associated with RAW pointers).
Move
If no destructor/copy Constructor/Copy Assignment/Move Constructor/Move Assignment operator is defined the compiler builds the move operators for you one of those for you.
The default implementation is:
Implicitly-declared move constructor
If no user-defined move constructors are provided for a class type (struct, class, or union), and all of the following is true:
Move Constructor:
Call the Base class Copy Constructor.
Call the move constructor for each member variable in the order of declaration.
Move Assignment Operator:
Call the base class assignment operator
Call the move assignment operator of each member variable in the order of declaration.
Return a reference to this.
I think it's worth pointing out that the default constructor will only be created by the compiler if you provide no constructor whatsoever. That means if you only provide one constructor that takes an argument, the compiler will not create the default no-arg constructor for you.
The zeroing-out behavior that your book talks about is probably specific to a particular compiler. I've always assumed that it can vary and that you should explicitly initialize any data members.
Does the compiler automatically generate a default constructor?
Does the implicitly generated default constructor perform zero
initialization?
If you legalistically parse the language of the 2003 standard, then the answers are yes, and no. However, this isn't the whole story because unlike a user-defined default constructor, an implicitly defined default constructor is not always used when creating an object from scratch -- there are two other scenarios: no construction and member-wise value-initialization.
The "no construction" case is really just a technicality because it is functionally no different than calling the trivial default constructor. The other case is more interesting: member-wise value-initialization is invoked by using "()" [as if explicitly invoking a constructor that has no arguments] and it bypasses what is technically referred to as the default constructor. Instead it recursively performs value-initialization on each data member, and for primitive data types, this ultimately resolves to zero-initialization.
So in effect, the compiler provides two different implicitly defined default constructors. One of which does perform zero initialization of primitive member data and the other of which does not. Here are some examples of how you can invoke each type of constructor:
MyClass a; // default-construction or no construction
MyClass b = MyClass(); // member-wise value-initialization
and
new MyClass; // default-construction or no construction
new MyClass(); // member-wise value-initialization
Note: If a user-declared default constructor does exist, then member-wise value-initialization simply calls that and stops.
Here's a somewhat detailed breakdown of what the standard says about this...
If you don't declare a constructor, the compiler implicitly creates a default constructor [12.1-5]
The default constructor does not initialize primitive types [12.1-7]
MyClass() {} // implicitly defined constructor
If you initialize an object with "()", this does not directly invoke the default constructor. Instead, it instigates a long sequence of rules called value-initialization [8.5-7]
The net effect of value initialization is that the implicitly declared default constructor is never called. Instead, a recursive member-wise value initialization is invoked which will ultimately zero-initialize any primitive members and calls the default constructor on any members which have a user-declared constructor [8.5-5]
Value-initialization applies even to primitive types -- they will be zero-initialized. [8.5-5]
int a = int(); // equivalent to int a = 0;
All of this is really moot for most purposes. The writer of a class cannot generally assume that data members will be zeroed out during an implicit initialization sequence -- so any self-managing class should define its own constructor if it has any primitive data members that require initialization.
So when does this matter?
There may be circumstances where generic code wants to force initialization of unknown types. Value-initialization provides a way to do this. Just remember that implicit zero-initialization does not occur if the user has provided a constructor.
By default, data contained by std::vector is value-initialized. This can prevent memory debuggers from identifying logic errors associated with otherwise uninitialized memory buffers.
vector::resize( size_type sz, T c=T() ); // default c is "value-initialized"
Entire arrays of primitives type or "plain-old-data" (POD)-type structures can be zero-initialized by using value-initialization syntax.
new int[100]();
This post has more details about variations between versions of the standard, and it also notes a case where the standard is applied differently in major compilers.
C++ does generate a default constructor but only if you don't provide one of your own. The standard says nothing about zeroing out data members. By default when you first construct any object, they're undefined.
This might be confusing because most of the C++ primitive types DO have default "constructors" that init them to zero (int(), bool(), double(), long(), etc.), but the compiler doesn't call them to init POD members like it does for object members.
It's worth noting that the STL does use these constructors to default-construct the contents of containers that hold primitive types. You can take a look at this question for more details on how things in STL containers get inited.
The default constructor created for a class will not initialize built-in types, but it will call the default constructor on all user-defined members:
class Foo
{
public:
int x;
Foo() : x(1) {}
};
class Bar
{
public:
int y;
Foo f;
Foo *fp;
};
int main()
{
Bar b1;
ASSERT(b1.f.x == 1);
// We know nothing about what b1.y is set to, or what b1.fp is set to.
// The class members' initialization parallels normal stack initialization.
int y;
Foo f;
Foo *fp;
ASSERT(f.x == 1);
// We know nothing about what y is set to, or what fp is set to.
}
The compiler will generate default constructors and destructors if user-created ones are not present. These will NOT modify the state of any data members.
In C++ (and C) the contents of any allocated data is not guaranteed. In debug configurations some platforms will set this to a known value (e.g. 0xFEFEFEFE) to help identify bugs, but this should not be relied upon.
Zero-ing out only occurs for globals. So if your object is declared in the global scope, its members will be zero-ed out:
class Blah
{
public:
int x;
int y;
};
Blah global;
int main(int argc, char **argv) {
Blah local;
cout<<global.x<<endl; // will be 0
cout<<local.x<<endl; // will be random
}
C++ does not guarantee zeroing out memory. Java and C# do (in a manner of speaking).
Some compilers might, but don't depend on that.
In C++11, a default constructor generated by the compiler is marked deleted , if :
the class has a reference field
or a const field without a user-defined default constructor
or a field without a default initializer, with a deleted default constructor
http://en.cppreference.com/w/cpp/language/default_constructor
C++ generates a default constructor. If needed (determined at compile time I believe), it will also generate a default copy constructor and a default assignment constructor. I haven't heard anything about guarantees for zeroing memory though.
The compiler by default will not be generating the default constructor unless the implementation does not require one .
So , basically the constructor has to be a non-trivial constructor.
For constructor to be non-trivial constructor, following are the conditions in which any one can suffice:
1) The class has a virtual member function.
2) Class member sub-objects or base classes have non-trivial constructors.
3) A class has virtual inheritance hierarchy.

why do I need a constructor function?

#include<iostream>
using namespace std;
class A {
public:
int i;
};
int main() {
const A aa; //This is wrong, I can't compile it! The implicitly-defined constructor does not initialize ‘int A::i’
}
when I use
class A {
public:
A() {}
int i;
};
this is ok! I can compile it! why I can't compile it when I use the implicitly-defined constructor?
why the implicit-defined constructor does not work?
It does work, but one of the language rules is that it can't be used to initialise a const object unless it initialises all the members; and it doesn't initialise members with trivial types like int. That usually makes sense, since being const there's no way to give them a value later.
(That's a slight simplification; see the comments for chapter and verse from the language standard.)
If you define your own constructor, then you're saying that you know what you're doing and don't want that member initialised. The compiler will let you use that even for a const object.
If you want to set it to zero, then you could value-initialise the object:
const A aa {}; // C++11 or later
const A aa = A(); // historic C++
If you want to set it to another value, or set it to zero without the user having to specify value-initialisation, then you'll need a constructor that initialises the member:
A() : i(whatever) {}
why the implicit-defined constructor does not work?
Because the C++ standard says so:
[dcl.init] paragraph 7:
If a program calls for the default initialization of an object of a const-qualified type T, T shall be a class type with a user-provided default constructor.
This ensures that you don't create a const object containing uninitialized data that cannot be initialized later.
To initialize a const-qualified object you need to have a user-provided default constructor or use an initialiser:
const A aa = A();
Here the object aa is initialized with the expression A() which is a value-initialized object. You can value-initialize a class type without a default constructor, because value-initialization will set values to zero if there is no default constructor for the type.
However, the rule in the standard is too strict, as it forbids using implicitly-defined constructors even when there are no data members or all data members have sensible default constructors, so there is a defect report against the standard proposing to change it, see issue 253.
You don't state what compiler you're using. I've tried this with VS2012 and get a warning C4269.
The reason this is a problem is because aa is const. Because you haven't defined a constructor a default one is used and so i can be anything. It also cannot be changed (because aa is const).
If you define a constructor, it is assumed that you are happy with the initialization of i. Although, in this case you haven't actually changed the behaviour.
From this MSDN page
Since this instance of the class is generated on the stack, the initial value of m_data can be anything. Also, since it is a const instance, the value of m_data can never be changed.
Because i is not initialized.
class A
{
public:
A()
{
i =0;
}
int i;
};
"Implicit constructor" means a constructor generated for you automatically and generates an error because it realizes it is not able to initialize the value of i. This can be a no-args constructor, a copy constructor or (as of C++11) a move constructor.
why the implicit-defined constructor does not work?
It works just fine, but it does not decide what your default values are implicitly (and as such, it only calls default constructors for it's members, but not for POD types).
If you want the constructor to initialize your members with certain values you have to explicitly write that (i.e. add a default constructor explicitly).
Making a default (implicit) constructor initialize POD members with a chosen value (like zero for example) would add extra computing cycles (and slow your program down) when you don't need that. C++ is designed to behave as if you (the programmer) know what you are doing (i.e. if you do not initialize your members explicitly, the compiler assumes you don't care what default value you get).

Unrestricted union in practice

I have some questions about unrestricted unions and their application in practice.
Let's suppose I have the following code :
struct MyStruct
{
MyStruct(const std::vector<int>& a) : array(a), type(ARRAY)
{}
MyStruct(bool b) : boolean(b), type(BOOL)
{}
MyStruct(const MyStruct& ms) : type(ms.type)
{
if (type == ARRAY)
new (&array) std::vector<int>(ms.array);
else
boolean = ms.boolean;
}
MyStruct& operator=(const MyStruct& ms)
{
if (&ms != this) {
if (type == ARRAY)
array.~vector<int>(); // EDIT(2)
if (ms.type == ARRAY)
new (&array) std::vector<int>(ms.array);
else
boolean = ms.boolean;
type = ms.type;
}
return *this;
}
~MyStruct()
{
if (type == ARRAY)
array.~vector<int>();
}
union {
std::vector<int> array;
bool boolean;
};
enum {ARRAY, BOOL} type;
};
Is this code valid :) ?
Is it necessary to explicitly call the vector destructor each time we are using the boolean (as stated here http://cpp11standard.blogspot.com/2012/11/c11-standard-explained-1-unrestricted.html)
Why a placement new is required instead of just doing something like 'array = ms.array' ?
EDIT:
Yes, it compiles
"Members declared inside anonymous unions are actually members of the containing class, and can be initialized in the containing class's constructor." (C++11 anonymous union with non-trivial members)
Adding explicit destructors as suggested, leads to SIGSEV with g++ 4.8 / clang 4.2
The code's buggy: change array.clear(); to array.~vector<int>();
Explanation: operator= is using placement new over an object that hasn't been destructed, which could do anything but practically you can expect it to leak the dynamic memory the previous array had been using (clear() doesn't release memory / change capacity, it just destructs elements and changes size).
From 9.5/2:
If any non-static data member of a union has a non-trivial default
constructor (12.1), copy constructor (12.8), move constructor (12.8), copy assignment operator (12.8), move
assignment operator (12.8), or destructor (12.4), the corresponding member function of the union must be
user-provided or it will be implicitly deleted (8.4.3) for the union.
So, the vector constructor, destructor etc never kicks in by themselves: you must call them explicitly when wanted.
In 9.5/3 there's an example:
Consider the following union:
union U {
int i;
float f;
std::string s;
};
Since std::string (21.3) declares non-trivial versions of all of the special member functions, U will have
an implicitly deleted default constructor, copy/move constructor, copy/move assignment operator, and destructor.
To use U, some or all of these member functions must be user-provided.
That last bit - "To use U, some or all of these member functions must be user-provided." - seems to presume that U needs to coordinate its own vaguely value-semantic behaviour, but in your case the surrouding struct is doing that so you don't need to define any of these union member functions.
2: we must call the array destructor whenever an array value is being replaced by a boolean value. If in operator= a new array value is being placement-newed instead of assigned, then the old array must also have its destructor called, but using operator= would be more efficient when the existing memory is sufficient for all the elements being copied. Basically, you must match constructions and destructions. UPDATE: the example code has a bug as per your comment below.
3: Why a placement new is required instead of just doing something like 'array = ms.array' ?
array = ms.array invokes std::vector<int>::operator= which always assumes the this pointer addresses an already properly constructed object. Inside that object you can expect there to be a pointer which will either be NULL or refer to some internal short-string buffer, or refer to heap. If your object hasn't been destructed, then operator= may well call a memory deallocation function on the bogus pointer. Placement new says "ignore the current content of the memory that this object will occupy, and construct a new object with valid members from scratch.
The union does not declare a default constructor, copy constructor, copy assignment operator, or destructor.
If std::string declares at least one non-trivial version of a special member function (which is the case), the forementioned ones are all implicitly deleted, and you must declare (and define) them (... if they're used, which is the case).
Insofar, that code isn't correct and should not successfully compile (this is almost-to-the-letter identical to the example in 9.5 par 3 of the standard, except there it's std::string, not std::vector).
(Does not apply for an anon union, as correctly pointed out)
About question (2): In order to safely switch the union, this is necessary, yes. The Standard explicitly says that in 9.5 par 4 [Note].
It makes sense too, if you think about it. At most one data member can be active in a union at any time, and they're not magically default constructed/destroyed, which means you need to properly construct/destruct things. It is not meaningful (or even defined) to use the union as something else otherwise (not that you couldn't do that anyway, but it's undefined).
The object is not a pointer, and you don't know whether it's allocated on the heap either (even if it is allocated on the heap, then it's inside another object, so it's still not allowable to delete it). How do you destroy an object if you can't call delete? How do you allocate an object -- possibly several times -- without leaking if you can't delete it? This doesn't leave many choices. Insofar, the [Note] makes perfect sense.

How do C++ class members get initialized if I don't do it explicitly?

Suppose I have a class with private memebers ptr, name, pname, rname, crname and age. What happens if I don't initialize them myself? Here is an example:
class Example {
private:
int *ptr;
string name;
string *pname;
string &rname;
const string &crname;
int age;
public:
Example() {}
};
And then I do:
int main() {
Example ex;
}
How are the members initialized in ex? What happens with pointers? Do string and int get 0-intialized with default constructors string() and int()? What about the reference member? Also what about const references?
I'd like to learn it so I can write better (bug free) programs. Any feedback would help!
In lieu of explicit initialization, initialization of members in classes works identically to initialization of local variables in functions.
For objects, their default constructor is called. For example, for std::string, the default constructor sets it to an empty string. If the object's class does not have a default constructor, it will be a compile error if you do not explicitly initialize it.
For primitive types (pointers, ints, etc), they are not initialized -- they contain whatever arbitrary junk happened to be at that memory location previously.
For references (e.g. std::string&), it is illegal not to initialize them, and your compiler will complain and refuse to compile such code. References must always be initialized.
So, in your specific case, if they are not explicitly initialized:
int *ptr; // Contains junk
string name; // Empty string
string *pname; // Contains junk
string &rname; // Compile error
const string &crname; // Compile error
int age; // Contains junk
First, let me explain what a mem-initializer-list is. A mem-initializer-list is a comma-separated list of mem-initializers, where each mem-initializer is a member name followed by (, followed by an expression-list, followed by a ). The expression-list is how the member is constructed. For example, in
static const char s_str[] = "bodacydo";
class Example
{
private:
int *ptr;
string name;
string *pname;
string &rname;
const string &crname;
int age;
public:
Example()
: name(s_str, s_str + 8), rname(name), crname(name), age(-4)
{
}
};
the mem-initializer-list of the user-supplied, no-arguments constructor is name(s_str, s_str + 8), rname(name), crname(name), age(-4). This mem-initializer-list means that the name member is initialized by the std::string constructor that takes two input iterators, the rname member is initialized with a reference to name, the crname member is initialized with a const-reference to name, and the age member is initialized with the value -4.
Each constructor has its own mem-initializer-list, and members can only be initialized in a prescribed order (basically the order in which the members are declared in the class). Thus, the members of Example can only be initialized in the order: ptr, name, pname, rname, crname, and age.
When you do not specify a mem-initializer of a member, the C++ standard says:
If the entity is a nonstatic data member ... of class type ..., the entity is default-initialized (8.5). ... Otherwise, the entity is not initialized.
Here, because name is a nonstatic data member of class type, it is default-initialized if no initializer for name was specified in the mem-initializer-list. All other members of Example do not have class type, so they are not initialized.
When the standard says that they are not initialized, this means that they can have any value. Thus, because the above code did not initialize pname, it could be anything.
Note that you still have to follow other rules, such as the rule that references must always be initialized. It is a compiler error to not initialize references.
You can also initialize data members at the point you declare them:
class another_example{
public:
another_example();
~another_example();
private:
int m_iInteger=10;
double m_dDouble=10.765;
};
I use this form pretty much exclusively, although I have read some people consider it 'bad form', perhaps because it was only recently introduced - I think in C++11. To me it is more logical.
Another useful facet to the new rules is how to initialize data-members that are themselves classes. For instance suppose that CDynamicString is a class that encapsulates string handling. It has a constructor that allows you specify its initial value CDynamicString(wchat_t* pstrInitialString). You might very well use this class as a data member inside another class - say a class that encapsulates a windows registry value which in this case stores a postal address. To 'hard code' the registry key name to which this writes you use braces:
class Registry_Entry{
public:
Registry_Entry();
~Registry_Entry();
Commit();//Writes data to registry.
Retrieve();//Reads data from registry;
private:
CDynamicString m_cKeyName{L"Postal Address"};
CDynamicString m_cAddress;
};
Note the second string class which holds the actual postal address does not have an initializer so its default constructor will be called on creation - perhaps automatically setting it to a blank string.
If you example class is instantiated on the stack, the contents of uninitialized scalar members is random and undefined.
For a global instance, uninitialized scalar members will be zeroed.
For members which are themselves instances of classes, their default constructors will be called, so your string object will get initialized.
int *ptr; //uninitialized pointer (or zeroed if global)
string name; //constructor called, initialized with empty string
string *pname; //uninitialized pointer (or zeroed if global)
string &rname; //compilation error if you fail to initialize this
const string &crname; //compilation error if you fail to initialize this
int age; //scalar value, uninitialized and random (or zeroed if global)
It depends on how the class is constructed
Answering this question comes understanding a huge switch case statement in the C++ language standard, and one which is hard for mere mortals to get intuition about.
As a simple example of how difficult thing are:
main.cpp
#include <cassert>
int main() {
struct C { int i; };
// This syntax is called "default initialization"
C a;
// i undefined
// This syntax is called "value initialization"
C b{};
assert(b.i == 0);
}
In default initialization you would start from: https://en.cppreference.com/w/cpp/language/default_initialization we go to the part "The effects of default initialization are" and start the case statement:
"if T is a non-POD": no (the definition of POD is in itself a huge switch statement)
"if T is an array type": no
"otherwise, nothing is done": therefore it is left with an undefined value
Then, if someone decides to value initialize we go to https://en.cppreference.com/w/cpp/language/value_initialization "The effects of value initialization are" and start the case statement:
"if T is a class type with no default constructor or with a user-provided or deleted default constructor": not the case. You will now spend 20 minutes Googling those terms:
we have an implicitly defined default constructor (in particular because no other constructor was defined)
it is not user-provided (implicitly defined)
it is not deleted (= delete)
"if T is a class type with a default constructor that is neither user-provided nor deleted": yes
"the object is zero-initialized and then it is default-initialized if it has a non-trivial default constructor": no non-trivial constructor, just zero-initialize. The definition of "zero-initialize" at least is simple and does what you expect: https://en.cppreference.com/w/cpp/language/zero_initialization
This is why I strongly recommend that you just never rely on "implicit" zero initialization. Unless there are strong performance reasons, explicitly initialize everything, either on the constructor if you defined one, or using aggregate initialization. Otherwise you make things very very risky for future developers.
Uninitialized non-static members will contain random data. Actually, they will just have the value of the memory location they are assigned to.
Of course for object parameters (like string) the object's constructor could do a default initialization.
In your example:
int *ptr; // will point to a random memory location
string name; // empty string (due to string's default costructor)
string *pname; // will point to a random memory location
string &rname; // it would't compile
const string &crname; // it would't compile
int age; // random value
Members with a constructor will have their default constructor called for initialisation.
You cannot depend on the contents of the other types.
If it is on the stack, the contents of uninitialized members that don't have their own constructor will be random and undefined. Even if it is global, it would be a bad idea to rely on them being zeroed out. Whether it is on the stack or not, if a member has its own constructor, that will get called to initialize it.
So, if you have string* pname, the pointer will contain random junk. but for string name, the default constructor for string will be called, giving you an empty string. For your reference type variables, I'm not sure, but it'll probably be a reference to some random chunk of memory.