Unrestricted union in practice - c++

I have some questions about unrestricted unions and their application in practice.
Let's suppose I have the following code :
struct MyStruct
{
MyStruct(const std::vector<int>& a) : array(a), type(ARRAY)
{}
MyStruct(bool b) : boolean(b), type(BOOL)
{}
MyStruct(const MyStruct& ms) : type(ms.type)
{
if (type == ARRAY)
new (&array) std::vector<int>(ms.array);
else
boolean = ms.boolean;
}
MyStruct& operator=(const MyStruct& ms)
{
if (&ms != this) {
if (type == ARRAY)
array.~vector<int>(); // EDIT(2)
if (ms.type == ARRAY)
new (&array) std::vector<int>(ms.array);
else
boolean = ms.boolean;
type = ms.type;
}
return *this;
}
~MyStruct()
{
if (type == ARRAY)
array.~vector<int>();
}
union {
std::vector<int> array;
bool boolean;
};
enum {ARRAY, BOOL} type;
};
Is this code valid :) ?
Is it necessary to explicitly call the vector destructor each time we are using the boolean (as stated here http://cpp11standard.blogspot.com/2012/11/c11-standard-explained-1-unrestricted.html)
Why a placement new is required instead of just doing something like 'array = ms.array' ?
EDIT:
Yes, it compiles
"Members declared inside anonymous unions are actually members of the containing class, and can be initialized in the containing class's constructor." (C++11 anonymous union with non-trivial members)
Adding explicit destructors as suggested, leads to SIGSEV with g++ 4.8 / clang 4.2

The code's buggy: change array.clear(); to array.~vector<int>();
Explanation: operator= is using placement new over an object that hasn't been destructed, which could do anything but practically you can expect it to leak the dynamic memory the previous array had been using (clear() doesn't release memory / change capacity, it just destructs elements and changes size).
From 9.5/2:
If any non-static data member of a union has a non-trivial default
constructor (12.1), copy constructor (12.8), move constructor (12.8), copy assignment operator (12.8), move
assignment operator (12.8), or destructor (12.4), the corresponding member function of the union must be
user-provided or it will be implicitly deleted (8.4.3) for the union.
So, the vector constructor, destructor etc never kicks in by themselves: you must call them explicitly when wanted.
In 9.5/3 there's an example:
Consider the following union:
union U {
int i;
float f;
std::string s;
};
Since std::string (21.3) declares non-trivial versions of all of the special member functions, U will have
an implicitly deleted default constructor, copy/move constructor, copy/move assignment operator, and destructor.
To use U, some or all of these member functions must be user-provided.
That last bit - "To use U, some or all of these member functions must be user-provided." - seems to presume that U needs to coordinate its own vaguely value-semantic behaviour, but in your case the surrouding struct is doing that so you don't need to define any of these union member functions.
2: we must call the array destructor whenever an array value is being replaced by a boolean value. If in operator= a new array value is being placement-newed instead of assigned, then the old array must also have its destructor called, but using operator= would be more efficient when the existing memory is sufficient for all the elements being copied. Basically, you must match constructions and destructions. UPDATE: the example code has a bug as per your comment below.
3: Why a placement new is required instead of just doing something like 'array = ms.array' ?
array = ms.array invokes std::vector<int>::operator= which always assumes the this pointer addresses an already properly constructed object. Inside that object you can expect there to be a pointer which will either be NULL or refer to some internal short-string buffer, or refer to heap. If your object hasn't been destructed, then operator= may well call a memory deallocation function on the bogus pointer. Placement new says "ignore the current content of the memory that this object will occupy, and construct a new object with valid members from scratch.

The union does not declare a default constructor, copy constructor, copy assignment operator, or destructor.
If std::string declares at least one non-trivial version of a special member function (which is the case), the forementioned ones are all implicitly deleted, and you must declare (and define) them (... if they're used, which is the case).
Insofar, that code isn't correct and should not successfully compile (this is almost-to-the-letter identical to the example in 9.5 par 3 of the standard, except there it's std::string, not std::vector).
(Does not apply for an anon union, as correctly pointed out)
About question (2): In order to safely switch the union, this is necessary, yes. The Standard explicitly says that in 9.5 par 4 [Note].
It makes sense too, if you think about it. At most one data member can be active in a union at any time, and they're not magically default constructed/destroyed, which means you need to properly construct/destruct things. It is not meaningful (or even defined) to use the union as something else otherwise (not that you couldn't do that anyway, but it's undefined).
The object is not a pointer, and you don't know whether it's allocated on the heap either (even if it is allocated on the heap, then it's inside another object, so it's still not allowable to delete it). How do you destroy an object if you can't call delete? How do you allocate an object -- possibly several times -- without leaking if you can't delete it? This doesn't leave many choices. Insofar, the [Note] makes perfect sense.

Related

copy constructor of 'Car' is implicitly deleted because variant field 'fuelType' has a non-trivial copy constructor [duplicate]

i want to use string inside Union.
if i write as below
union U
{
int i;
float f;
string s;
};
Compiler gives error saying U::S has copy constructor.
I read some other post for alternate ways for solving this issue.
But i want to know why compiler doesn't allow this in the first place?
EDIT: #KennyTM: In any union, if member is initialized others will have garbage values, if none is initialized all will have garbage values. I think, tagged union just provides some comfort to access valid values from Union.
Your question: how do you or the compiler write a copy constructor for the union above without extra information?
sizeof(string) gives 4 bytes. Based on this, compiler can compare other members sizes and allocate largest allocation(4bytes in our example). Internal string length doesn't matter because it will be stored in a seperate location. Let the string be of any length. All that Union has to know is invoking string class copy constructor with string parameter. In whichever way compiler finds that copy constructor has to be invoked in normal case, similar method as to be followed even when string is inside Union. So i am thinking compiler could do like, allocate 4 bytes. Then if any string is assigned to s, then string class will take care of allocation and copying of that string using its own allocator. So there is no chance of memory corruption as well.
Is string not existed at the time of Union developement in compiler ?
So the answer is not clear to me still.
Am a new joinee in this site, if anything wrong, pls excuse me.
Because having a class with a non-trivial (copy/)constructor in a union doesn't make sense. Suppose we have
union U {
string x;
vector<int> y;
};
U u; // <--
If U was a struct, u.x and u.y would be initialized to an empty string and empty vector respectively. But members of a union share the same address. So, if u.x is initialized, u.y will contain invalid data, and so is the reverse. If both of them are not initialized then they cannot be used. In any case, having these data in a union cannot be handled easily, so C++98 chooses to deny this: (§9.5/1):
An object of a class with a non-trivial constructor (12.1), a non-trivial copy constructor (12.8), a non-trivial destructor (12.4), or a non-trivial copy assignment operator (13.5.3, 12.8) cannot be a member of a union, nor can an array of such objects.
In C++0x this rule has been relaxed (§9.5/2):
At most one non-static data member of a union may have a brace-or-equal-initializer. [Note: if any non-static data member of a union has a non-trivial default constructor (12.1), copy constructor (12.8), move constructor (12.8), copy assignment operator (12.8), move
assignment operator (12.8), or destructor (12.4), the corresponding member function of the union must be user-provided or it will be implicitly deleted (8.4.3) for the union. — end note ]
but it is still a not possible to create (correct) con/destructors for the union, e.g. how do you or the compiler write a copy constructor for the union above without extra information? To ensure which member of the union is active, you need a tagged union, and you need to handle the construction and destruction manually e.g.
struct TU {
int type;
union {
int i;
float f;
std::string s;
} u;
TU(const TU& tu) : type(tu.type) {
switch (tu.type) {
case TU_STRING: new(&u.s)(tu.u.s); break;
case TU_INT: u.i = tu.u.i; break;
case TU_FLOAT: u.f = tu.u.f; break;
}
}
~TU() {
if (tu.type == TU_STRING)
u.s.~string();
}
...
};
But, as #DeadMG has mentioned, this is already implemented as boost::variant or boost::any.
Think about it. How does the compiler know what type is in the union?
It doesn't. The fundamental operation of a union is essentially a bitwise cast. Operations on values contained within unions are only safe when each type can essentially be filled with garbage. std::string can't, because that would result in memory corruption. Use boost::variant or boost::any.
In C++98/03, members of a union can't have constructors, destructors, virtual member functions, or base classes.
So basically, you can only use built-in data types, or PODs
Note that it is changing in C++0x: Unrestricted unions
union {
int z;
double w;
string s; // Illegal in C++98, legal in C++0x.
};
From the C++ spec §9.5.1:
An object of a class with a non-trivial constructor, a non-trivial copy constructor, a non-trivial destructor, or a non-trivial copy assignment operator cannot be a member of a union.
The reason for this rule is that the compiler will never know which of the destructors/constructors call, since it never really knows which of the possible objects is inside the union.
The garbage is introduced if you
assign a string
then assign an int or float
then a string again
string manages memory somewhere else. This information is most likely some pointer. This pointer is garbaged when assigning the int. Assigning a new string should destroy the old string, which is not possible.
The second step should destroy the string, but does not know, if there has been a string.
They obviously have found a solution for this problem in the meantime.
You can now do it.
Of course if you initialize any other member of the union first, or simply don't initialize the string at all, then there's a problem.
Since the string class overloads the assignment operator, you can't then initialize the string with an assignment operation:
this->union_string = std::string("whatever");
Will fail because you're still using the assignment operator.
To properly initialize a union string after you've put something else in the union or not initialized it in the first place, you have to call the constructor directly on that memory:
new(&this->union_string) std::string("whatever");
This way you're simply not using the assignment function at all.
Another concern is your compiler should make you make a destructor, and if for some reason not, you should make it anyway. Since it's a union, by the end of your class's lifetime the compiler can't know whether that union memory is used by the string or something else, so your destructor should call the string's destructor if that's the case.
So if you don't do it, you'll have a memory leak since the constructor for the string is never called, and it never knows to release the memory it's using.
In new C++ standard (I tested it in C++17), you can use a complex type as a member of union.
struct ustring
{
union
{
string s;
wstring ws;
};
bool bAscii = true;
~ustring()
{
if (bAscii)
{
s.~string();
}
else
{
ws.~wstring();
}
}
};
However, you should be very careful. Think about you construct s but destruct ws.

Is a constructor implicitly declared in the case of a abstract class with no data members? [duplicate]

In the book I'm reading at the moment (C++ Without Fear) it says that if you don't declare a default constructor for a class, the compiler supplies one for you, which "zeroes out each data member". I've experimented with this, and I'm not seeing any zeroing -out behaviour. I also can't find anything that mentions this on Google. Is this just an error or a quirk of a specific compiler?
If you do not define a constructor, the compiler will define a default constructor for you.
Construction
The implementation of this
default constructor is:
default construct the base class (if the base class does not have a default constructor, this is a compilation failure)
default construct each member variable in the order of declaration. (If a member does not have a default constructor, this is a compilation failure).
Note:
The POD data (int,float,pointer, etc.) do not have an explicit constructor but the default action is to do nothing (in the vane of C++ philosophy; we do not want to pay for something unless we explicitly ask for it).
Copy
If no destructor/copy Constructor/Copy Assignment operator is defined the compiler builds one of those for you (so a class always has a destructor/Copy Constructor/Assignment Operator (unless you cheat and explicitly declare one but don't define it)).
The default implementation is:
Destructor:
If user-defined destructor is defined, execute the code provided.
Call the destructor of each member in reverse order of declaration
Call the destructor of the base class.
Copy Constructor:
Call the Base class Copy Constructor.
Call the copy constructor for each member variable in the order of declaration.
Copy Assignment Operator:
Call the base class assignment operator
Call the copy assignment operator of each member variable in the order of declaration.
Return a reference to this.
Note Copy Construction/Assignment operator of POD Data is just copying the data (Hence the shallow copy problem associated with RAW pointers).
Move
If no destructor/copy Constructor/Copy Assignment/Move Constructor/Move Assignment operator is defined the compiler builds the move operators for you one of those for you.
The default implementation is:
Implicitly-declared move constructor
If no user-defined move constructors are provided for a class type (struct, class, or union), and all of the following is true:
Move Constructor:
Call the Base class Copy Constructor.
Call the move constructor for each member variable in the order of declaration.
Move Assignment Operator:
Call the base class assignment operator
Call the move assignment operator of each member variable in the order of declaration.
Return a reference to this.
I think it's worth pointing out that the default constructor will only be created by the compiler if you provide no constructor whatsoever. That means if you only provide one constructor that takes an argument, the compiler will not create the default no-arg constructor for you.
The zeroing-out behavior that your book talks about is probably specific to a particular compiler. I've always assumed that it can vary and that you should explicitly initialize any data members.
Does the compiler automatically generate a default constructor?
Does the implicitly generated default constructor perform zero
initialization?
If you legalistically parse the language of the 2003 standard, then the answers are yes, and no. However, this isn't the whole story because unlike a user-defined default constructor, an implicitly defined default constructor is not always used when creating an object from scratch -- there are two other scenarios: no construction and member-wise value-initialization.
The "no construction" case is really just a technicality because it is functionally no different than calling the trivial default constructor. The other case is more interesting: member-wise value-initialization is invoked by using "()" [as if explicitly invoking a constructor that has no arguments] and it bypasses what is technically referred to as the default constructor. Instead it recursively performs value-initialization on each data member, and for primitive data types, this ultimately resolves to zero-initialization.
So in effect, the compiler provides two different implicitly defined default constructors. One of which does perform zero initialization of primitive member data and the other of which does not. Here are some examples of how you can invoke each type of constructor:
MyClass a; // default-construction or no construction
MyClass b = MyClass(); // member-wise value-initialization
and
new MyClass; // default-construction or no construction
new MyClass(); // member-wise value-initialization
Note: If a user-declared default constructor does exist, then member-wise value-initialization simply calls that and stops.
Here's a somewhat detailed breakdown of what the standard says about this...
If you don't declare a constructor, the compiler implicitly creates a default constructor [12.1-5]
The default constructor does not initialize primitive types [12.1-7]
MyClass() {} // implicitly defined constructor
If you initialize an object with "()", this does not directly invoke the default constructor. Instead, it instigates a long sequence of rules called value-initialization [8.5-7]
The net effect of value initialization is that the implicitly declared default constructor is never called. Instead, a recursive member-wise value initialization is invoked which will ultimately zero-initialize any primitive members and calls the default constructor on any members which have a user-declared constructor [8.5-5]
Value-initialization applies even to primitive types -- they will be zero-initialized. [8.5-5]
int a = int(); // equivalent to int a = 0;
All of this is really moot for most purposes. The writer of a class cannot generally assume that data members will be zeroed out during an implicit initialization sequence -- so any self-managing class should define its own constructor if it has any primitive data members that require initialization.
So when does this matter?
There may be circumstances where generic code wants to force initialization of unknown types. Value-initialization provides a way to do this. Just remember that implicit zero-initialization does not occur if the user has provided a constructor.
By default, data contained by std::vector is value-initialized. This can prevent memory debuggers from identifying logic errors associated with otherwise uninitialized memory buffers.
vector::resize( size_type sz, T c=T() ); // default c is "value-initialized"
Entire arrays of primitives type or "plain-old-data" (POD)-type structures can be zero-initialized by using value-initialization syntax.
new int[100]();
This post has more details about variations between versions of the standard, and it also notes a case where the standard is applied differently in major compilers.
C++ does generate a default constructor but only if you don't provide one of your own. The standard says nothing about zeroing out data members. By default when you first construct any object, they're undefined.
This might be confusing because most of the C++ primitive types DO have default "constructors" that init them to zero (int(), bool(), double(), long(), etc.), but the compiler doesn't call them to init POD members like it does for object members.
It's worth noting that the STL does use these constructors to default-construct the contents of containers that hold primitive types. You can take a look at this question for more details on how things in STL containers get inited.
The default constructor created for a class will not initialize built-in types, but it will call the default constructor on all user-defined members:
class Foo
{
public:
int x;
Foo() : x(1) {}
};
class Bar
{
public:
int y;
Foo f;
Foo *fp;
};
int main()
{
Bar b1;
ASSERT(b1.f.x == 1);
// We know nothing about what b1.y is set to, or what b1.fp is set to.
// The class members' initialization parallels normal stack initialization.
int y;
Foo f;
Foo *fp;
ASSERT(f.x == 1);
// We know nothing about what y is set to, or what fp is set to.
}
The compiler will generate default constructors and destructors if user-created ones are not present. These will NOT modify the state of any data members.
In C++ (and C) the contents of any allocated data is not guaranteed. In debug configurations some platforms will set this to a known value (e.g. 0xFEFEFEFE) to help identify bugs, but this should not be relied upon.
Zero-ing out only occurs for globals. So if your object is declared in the global scope, its members will be zero-ed out:
class Blah
{
public:
int x;
int y;
};
Blah global;
int main(int argc, char **argv) {
Blah local;
cout<<global.x<<endl; // will be 0
cout<<local.x<<endl; // will be random
}
C++ does not guarantee zeroing out memory. Java and C# do (in a manner of speaking).
Some compilers might, but don't depend on that.
In C++11, a default constructor generated by the compiler is marked deleted , if :
the class has a reference field
or a const field without a user-defined default constructor
or a field without a default initializer, with a deleted default constructor
http://en.cppreference.com/w/cpp/language/default_constructor
C++ generates a default constructor. If needed (determined at compile time I believe), it will also generate a default copy constructor and a default assignment constructor. I haven't heard anything about guarantees for zeroing memory though.
The compiler by default will not be generating the default constructor unless the implementation does not require one .
So , basically the constructor has to be a non-trivial constructor.
For constructor to be non-trivial constructor, following are the conditions in which any one can suffice:
1) The class has a virtual member function.
2) Class member sub-objects or base classes have non-trivial constructors.
3) A class has virtual inheritance hierarchy.

Can I use the = operator to assign one object's values to another object, without overloading the operator?

this was a recent T/F question that came up in my cs course that I found a bit confusing.
The textbook states:
The = operator may be used to assign one object's data to another object, or to initialize one object with another object's data. By default, each member of one object is copied to its counterpart in the other object.
The question verbatim was:
You cannot use the = operator to assign one object's values to another object, unless you overload the operator. T/F?
Judging from that particular paragraph of the textbook, I answered false. However, it turns out the quiz answer was actually true.
When I looked up the question online, I saw other sources listing the answer as "false" as well. Granted, these were just generic flashcard / quiz websites, so I don't put much stock in them.
Basically, I'm just curious what the real answer is for future studying purposes.
P.S.: The textbook later goes on to state: "In order to change the way the assignment operator works, it must be overloaded. Operator overloading permits you to redefine an existing operator’s behavior when used with a class
object."
I feel like this is related and supports the "true" answer, but I'm not really sure.
The statement
” You cannot use the = operator to assign one object's values to another object, unless you overload the operator.
… is trivially false, because you can assign to e.g. an int variable. Which is an object. In C++.
Possibly they intended to write “class-type object”, but even so it's false. E.g. any POD (Plain Old Data) class type object is assignable and lacks an overloaded assignment operator.
However, there are cases where you want or need to prohibit or take charge of assignment.
If you don't implement the assignment operator yourself, the compiler will generate one for you, which will copy the data from the source to the destination, invoking assignment operators of the members of your class where possible/necessary.
This does not work if your class e.g. features const members, and it does not yield the desired result if your class contains pointers to dynamically allocated objects.
So, I would also say that the statement is false.
That's a trick question. The syntactic answer is "false" but the semantic answer is "maybe".
struct Foo
{
int a;
double b;
};
Foo foo1;
Foo foo2;
foo2 = foo1; // Ok in both senses.
struct Bar
{
Bar() : arr(new int[20]) {}
~Bar() { delete [] arr; }
int* arr;
};
Bar bar1;
Bar bar2;
bar2 = bar1; // Ok in syntax not in semantics
// This will lead to UB
In C++, when User-Defined Types such as class and struct are involved then its better to overload operator= for the Type. Using default assignment operator with object of class or struct will involve Shallow Copy. Shallow Copy sometimes lead to undefined behavior when the class or struct objects have dynamically allocated memory inside them.
Appropriately overloading operator= for class and struct types will lead to Deep Copy which is the correct method of assigning an object ObjA to object ObjB (ObjB = ObjA) when ObjA and ObjB are objects of some class or struct and contain dynamically allocated memory inside them.
The = operator may be used to assign one object's data to another object, or to initialize one object with another object's data. By default, each member of one object is copied to its counterpart in the other object. This is true when object of class/struct has only static fundamental type of data inside it. Here,
by static it means that all the memory requirement of object is fixed at compile time only.
You cannot use the default = operator to assign one object's values to another object, unless you overload the operator. This is true when object of class/struct has some dynamically allocated memory inside it, possibly by using pointers.
For more reference see:
What is the difference between a deep copy and a shallow copy?
You cannot use the = operator to assign one object's values to another object, unless you overload the operator. T/F?
Ans: F
You can assign one object's values to another object without overloading = operator, as compiler by default defines assignment operator for the class in all case except for below
you have explicitly declared a copy-assignment operator (for class X an operator = taking X, X& or const X&) )
there is a member in your class that is not assignable (such as a reference, a const object or a class with no or inaccessible assignment operator)
But you need to understand why it becomes mandatory to overload the assignment operator in case you are having user defined members or having dynamic memory allocation,for this read concept of shallow and deep copy

move constructors for vectors of shared_ptr<MyClass>

I understand if you wish to pass a vector of MyClass objects and it is a temporary variable, if there is a move constructor defined for MyClass then this will be called, but what happens if you pass a vector of boost::shared_ptr<MyClass> or std::shared_ptr<MyClass>? Does the shared_ptr have a move constructor which then call's MyClass's move constructor?
if there is a move constructor defined for MyClass then this will be called
Usually not. Moving a vector is usually done my transferring ownership of the managed array, leaving the moved-from vector empty. The objects themselves aren't touched. (I think there may be an exception if the two vectors have incompatible allocators, but that's beyond anything I've ever needed to deal with, so I'm not sure about the details there).
Does the shared_ptr have a move constructor which then call's MyClass's move constructor?
No. Again, it has a move constructor which transfers ownership of the MyClass object to the new pointer, leaving the old pointer empty. The object itself is untouched.
Yes, std::shared_ptr<T> has a move constructor, as well as a templated constructor that can move from related shared pointers, but it does not touch the managed object at all. The newly constructed shared pointer shares ownership of the managed object (if there was one), and the moved-from pointer is disengaged ("null").
Example:
struct Base {}; // N.B.: No need for a virtual destructor
struct Derived : Base {};
auto p = std::make_shared<Derived>();
std::shared_ptr<Base> q = std::move(p);
assert(!p);
If you mean moving std::vector<std::shared_ptr<MyClass>>. Then even the move constructor of std::shared_ptr won't be called. Because the move operation is directly done on std::vectorlevel.
For example, a std::vector<T> may be implemented as a pointer to array of T, and a size member. The move constructor for this can be implemented as:
template <typename T>
class vector {
public:
/* ... other members */
vector(vector &&another): _p(another._p), _size(another._size) {
/* Transfer data ownership */
another._p = nullptr;
another._size = 0;
}
private:
T *_p;
size_t _size;
}
You can see in this process, no data member of type T is touched at all.
EDIT: More specially in C++11 Standard: §23.2.1. General container requirements (4) there is a table contains requirements on implementations of general containers, which contains following requirements:
(X is the type of the elements, u is an identifier declaration, rv is rvalue reference, a is a container of type X)
X u(rv)
X u = rv
C++ Standard: These two (move constructors) should have constant time complexity for all standard containers except std::array.
So it's easy to conclude implementations must use a way like I pointed above for move constructors of std::vector since it cannot invoke move constructors of individual elements or the time complexity will become linear time.
a = rv
C++ Standard: All existing elements of a are either move assigned to or destroyed a shall be equal to the value that rv had before this assignment.
This is for move assign operator. This sentence only states that original elements in a should be "properly handled" (either move-assigned in or destroyed). But this is not a strict requirement. IMHO implementations can choose the best suited way.
I also looked at code in Visual C++ 2013 and this is the snippet I found (vector header, starting from line 836):
/* Directly move, like code above */
void _Assign_rv(_Myt&& _Right, true_type)
{ // move from _Right, stealing its contents
this->_Swap_all((_Myt&)_Right);
this->_Myfirst = _Right._Myfirst;
this->_Mylast = _Right._Mylast;
this->_Myend = _Right._Myend;
_Right._Myfirst = pointer();
_Right._Mylast = pointer();
_Right._Myend = pointer();
}
/* Both move assignment operator and move constructor will call this */
void _Assign_rv(_Myt&& _Right, false_type)
{ // move from _Right, possibly moving its contents
if (get_allocator() == _Right.get_allocator())
_Assign_rv(_STD forward<_Myt>(_Right), true_type());
else
_Construct(_STD make_move_iterator(_Right.begin()),
_STD make_move_iterator(_Right.end()));
}
In this code the operation is clear: if both this and right operand have the same allocator, it will directly steal contents without doing anything on individual elements. But if they haven't, then move operations of individual elements will be called. At this time, other answers apply (for std::shared_ptr stuff).

Why compiler doesn't allow std::string inside union?

i want to use string inside Union.
if i write as below
union U
{
int i;
float f;
string s;
};
Compiler gives error saying U::S has copy constructor.
I read some other post for alternate ways for solving this issue.
But i want to know why compiler doesn't allow this in the first place?
EDIT: #KennyTM: In any union, if member is initialized others will have garbage values, if none is initialized all will have garbage values. I think, tagged union just provides some comfort to access valid values from Union.
Your question: how do you or the compiler write a copy constructor for the union above without extra information?
sizeof(string) gives 4 bytes. Based on this, compiler can compare other members sizes and allocate largest allocation(4bytes in our example). Internal string length doesn't matter because it will be stored in a seperate location. Let the string be of any length. All that Union has to know is invoking string class copy constructor with string parameter. In whichever way compiler finds that copy constructor has to be invoked in normal case, similar method as to be followed even when string is inside Union. So i am thinking compiler could do like, allocate 4 bytes. Then if any string is assigned to s, then string class will take care of allocation and copying of that string using its own allocator. So there is no chance of memory corruption as well.
Is string not existed at the time of Union developement in compiler ?
So the answer is not clear to me still.
Am a new joinee in this site, if anything wrong, pls excuse me.
Because having a class with a non-trivial (copy/)constructor in a union doesn't make sense. Suppose we have
union U {
string x;
vector<int> y;
};
U u; // <--
If U was a struct, u.x and u.y would be initialized to an empty string and empty vector respectively. But members of a union share the same address. So, if u.x is initialized, u.y will contain invalid data, and so is the reverse. If both of them are not initialized then they cannot be used. In any case, having these data in a union cannot be handled easily, so C++98 chooses to deny this: (§9.5/1):
An object of a class with a non-trivial constructor (12.1), a non-trivial copy constructor (12.8), a non-trivial destructor (12.4), or a non-trivial copy assignment operator (13.5.3, 12.8) cannot be a member of a union, nor can an array of such objects.
In C++0x this rule has been relaxed (§9.5/2):
At most one non-static data member of a union may have a brace-or-equal-initializer. [Note: if any non-static data member of a union has a non-trivial default constructor (12.1), copy constructor (12.8), move constructor (12.8), copy assignment operator (12.8), move
assignment operator (12.8), or destructor (12.4), the corresponding member function of the union must be user-provided or it will be implicitly deleted (8.4.3) for the union. — end note ]
but it is still a not possible to create (correct) con/destructors for the union, e.g. how do you or the compiler write a copy constructor for the union above without extra information? To ensure which member of the union is active, you need a tagged union, and you need to handle the construction and destruction manually e.g.
struct TU {
int type;
union {
int i;
float f;
std::string s;
} u;
TU(const TU& tu) : type(tu.type) {
switch (tu.type) {
case TU_STRING: new(&u.s)(tu.u.s); break;
case TU_INT: u.i = tu.u.i; break;
case TU_FLOAT: u.f = tu.u.f; break;
}
}
~TU() {
if (tu.type == TU_STRING)
u.s.~string();
}
...
};
But, as #DeadMG has mentioned, this is already implemented as boost::variant or boost::any.
Think about it. How does the compiler know what type is in the union?
It doesn't. The fundamental operation of a union is essentially a bitwise cast. Operations on values contained within unions are only safe when each type can essentially be filled with garbage. std::string can't, because that would result in memory corruption. Use boost::variant or boost::any.
In C++98/03, members of a union can't have constructors, destructors, virtual member functions, or base classes.
So basically, you can only use built-in data types, or PODs
Note that it is changing in C++0x: Unrestricted unions
union {
int z;
double w;
string s; // Illegal in C++98, legal in C++0x.
};
From the C++ spec §9.5.1:
An object of a class with a non-trivial constructor, a non-trivial copy constructor, a non-trivial destructor, or a non-trivial copy assignment operator cannot be a member of a union.
The reason for this rule is that the compiler will never know which of the destructors/constructors call, since it never really knows which of the possible objects is inside the union.
The garbage is introduced if you
assign a string
then assign an int or float
then a string again
string manages memory somewhere else. This information is most likely some pointer. This pointer is garbaged when assigning the int. Assigning a new string should destroy the old string, which is not possible.
The second step should destroy the string, but does not know, if there has been a string.
They obviously have found a solution for this problem in the meantime.
You can now do it.
Of course if you initialize any other member of the union first, or simply don't initialize the string at all, then there's a problem.
Since the string class overloads the assignment operator, you can't then initialize the string with an assignment operation:
this->union_string = std::string("whatever");
Will fail because you're still using the assignment operator.
To properly initialize a union string after you've put something else in the union or not initialized it in the first place, you have to call the constructor directly on that memory:
new(&this->union_string) std::string("whatever");
This way you're simply not using the assignment function at all.
Another concern is your compiler should make you make a destructor, and if for some reason not, you should make it anyway. Since it's a union, by the end of your class's lifetime the compiler can't know whether that union memory is used by the string or something else, so your destructor should call the string's destructor if that's the case.
So if you don't do it, you'll have a memory leak since the constructor for the string is never called, and it never knows to release the memory it's using.
In new C++ standard (I tested it in C++17), you can use a complex type as a member of union.
struct ustring
{
union
{
string s;
wstring ws;
};
bool bAscii = true;
~ustring()
{
if (bAscii)
{
s.~string();
}
else
{
ws.~wstring();
}
}
};
However, you should be very careful. Think about you construct s but destruct ws.