In my quest to learn C++ I stumbled across the article Writing Copy Constructors and Assignment Operators which proposes a mechanism to avoid code duplication across copy constructors and assignment operators.
To summarise/duplicate the content of that link, the proposed mechanism is:
struct UtilityClass
{
...
UtilityClass(UtilityClass const &rhs)
: data_(new int(*rhs_.data_))
{
// nothing left to do here
}
UtilityClass &operator=(UtilityClass const &rhs)
{
//
// Leaves all the work to the copy constructor.
//
if(this != &rhs)
{
// deconstruct myself
this->UtilityClass::~UtilityClass();
// reconstruct myself by copying from the right hand side.
new(this) UtilityClass(rhs);
}
return *this;
}
...
};
This seems like a nice way to avoid code duplication whilst ensuring "programatic integrity" but needs weighed against the risk of wasting effort freeing-then-allocating nested memory that could, instead, be re-used (as its author points out).
But I'm not familiar with the syntax that lies at its core:
this->UtilityClass::~UtilityClass()
I assume that this is a way to call the object's destructor (to destroy the contents of the object structure) whilst keeping the structure itself. To a C++ newbie, the syntax looks like a strange mixture of an object method and a class method.
Could someone please explain this syntax to me, or point me to a resource which explains it?
How does that call differ from the following?
this->~UtilityClass()
Is this a legitimate call? Does this additionally destroy the object structure (free from heap; pop off the stack)?
TL;DR version: DO NOT FOLLOW ANY ADVICE GIVEN BY THE AUTHOR OF THAT LINK
The link suggests that this technique can be used in a base class as long as a virtual destructor call is not used, because doing so would destruct parts of the derived class, which isn't the responsibility of the base class operator=.
This line of reasoning totally fails. The technique can never ever be used in a base class. The reason is that the C++ Standard only allows in-place replacement of an object with another object of the exact same type (see section 3.8 of the Standard):
If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:
the storage for the new object exactly overlays the storage location which the original object occupied, and
the new object is of the same type as the original object (ignoring the top-level cv-qualifiers), and
the type of the original object is not const-qualified, and, if a class type, does not contain any non-static data member whose type is const-qualified or a reference type, and
the original object was a most derived object (1.8) of type T and the new object is a most derived object of type T (that is, they are not base class subobjects).
In the original code, both return *this; and subsequent use of the object are undefined behavior; they access an object which has been destroyed, not the newly created object.
This is a problem in practice as well: the placement-new call will set up a v-table ptr corresponding to the base class, not the correct derived type of the object.
Even for leaf classes (non-base classes) the technique is highly questionable.
TL;DR DO NOT DO THIS.
To answer the specific question:
In this particular example, there's no difference. As explained in the article you link to, there would be a difference if this were a polymorphic base class, with a virtual destructor.
A qualified call:
this->UtilityClass::~UtilityClass()
would specifically call the destructor of this class, not that of the most derived class. So it only destroys the subobject being assigned to, not the entire object.
An unqualified call:
this->~UtilityClass()
would use virtual dispatch to call the most derived destructor, destroying the complete object.
The article writer claims that the first is what you want, so that you only assign to the base sub-object, leaving the derived parts intact. However, what you actually do overwrite part of the object with a new object of the base type; you've changed the dynamic type, and leaked whatever was in the derived parts of the old object. This is a bad thing to do in any circumstances. You've also introduced an exception issue: if the construction of the new object fails, then the old object is left in an invalid state, and can't even be safely destroyed.
UPDATE: you also have undefined behaviour since, as described in another answer, it's forbidden to use placement-new to create an object on top of (part of) a differently-typed object.
For non-polymorphic types, a good way to write a copy-assignment operator is with the copy-and-swap idiom. That both avoids duplication by reusing the copy-constructor, and provides a strong exception guarantee - if the assignment fails, then the original object is unmodified.
For polymorphic types, copying objects is more involved, and can't generally be done with a simple assignment operator. A common approach is a virtual clone function, which each type overrides to dynamically allocate a copy of itself with the correct type.
You can decide to how to call the destructor:
this->MyClass::~MyClass(); // Non-virtual call
this->~MyClass(); // Virtual call
Related
I'm not a C++ guru, navigating some code from an open source project trying to resolve an issue we have with the Java interface and the documentation is terrible. We've resolved that it may be due to the fact that a cloned object is what is used and not the originally instantiated object. The clone is created in the following method:
Base* Computer::Clone() const
{
Base *clone = new Computer(*this);
return (clone);
}
Computer is a subclass of Base so it returns a pointer to a Base object, created using the constructor of the same class, Computer and is done so via dereference (I believe thats what this is called here) with *this.
Now assuming Computer has an attribute bool isModeConfigured, if that is set to false for the original object, if the clone sets its to true does that replicate through? My inclination here is to say no, given the use of new.
This expression
new Computer(*this) calls the copy constructor Computer::Computer(const Computer&), whatever happens in there depends on its implementation.
Default implementation provided by the compiler (given the requirements for its generation are met) does member-wise copy of all attributes. "copy" means calling copy constructors of all members, same as above.
If you have
class Computer{
bool member;
}
Then the new instance has a copy of member and changing it on one object leaves the other unaffected. Hence both objects are totally independent.
AFAIK all C++ containers have the sane semantics and copies are independent of each other. The problem is with any kind of pointer, since default implementation (even for smart ones) is to make shallow copies. If you had (and you shouldn't) bool* member, both objects would share the pointed-to object.
Note: Never use new (unless you have to but those cases are rare), the modern, safe implementation is
std::unique_ptr<Base> Computer::Clone() const
{
return std::make_unique<Computer>(*this);
}
In the comments and answers to this question:
Virtual function compiler optimization c++
it is argued that a virtual function call in a loop cannot be devirtualized, because the virtual function might replace this by another object using placement new, e.g.:
void A::foo() { // virtual
static_assert(sizeof(A) == sizeof(Derived));
new(this) Derived;
}
The example is from a LLVM blog article about devirtualization
Now my question is: is that allowed by the standard?
I could find this on cppreference about storage reuse: (emphasis mine)
A program is not required to call the destructor of an object to end its lifetime if the object is trivially-destructible or if the program does not rely on the side effects of the destructor. However, if a program ends the lifetime of an non-trivial object, it must ensure that a new object of the same type is constructed in-place (e.g. via placement new) before the destructor may be called implicitly
If the new object must have the same type, it must have the same virtual functions. So it is not possible to have a different virtual function, and thus, devirtualization is acceptable.
Or do I misunderstand something?
The quote you provided says:
If a program ends the lifetime of an non-trivial object, it must ensure that a new object of the same type is constructed in-place (e.g. via placement new) before the destructor may be called implicitly
The intent of this statement relates to something a bit different to what you are doing. The statement is meant to say that when you destroy an object without destroying its name, something still refers to that storage with the original type, o you need to construct a new object there so that when the implicit destruction occurs, there is a valid object to destroy. This is relevant for example if you have an automatic ("stack") variable, and you call its destructor--you need to construct a new instance there before the destructor is called when the variable goes out of scope.
The statement as a whole, and its "of the same type" clause in particular, has no bearing on the topic you're discussing, which is whether you are allowed to construct a different polymorphic type having the same storage requirements in place of an old one. I don't know of any reason why you shouldn't be allowed to do that.
Now, that being said, the question you linked to is doing something different: it is calling a function using implicit this in a loop, and the question is whether the compiler could assume that the vptr for this will not change in that loop. I believe the compiler could (and clang -fstrict-vtable-pointers does) assume this, because this is only valid if the type is the same after the placement new.
So while the quotes from the standard you have provided are not relevant to this issue, the end result is that it does seem possible for an optimizer to devirtualize function calls made in a loop under the assumption that the type of *this (or its vptr) cannot change. The type of an object stored at an address (and its vptr) can change, but if it does, the old this is no longer valid.
It appears that you intend to use the new object using handles (pointers, references, or the original variable name) that existed prior to its recreation. That's allowed only if the instance type is not changed, plus some other conditions excluding const objects and sub-objects:
From [basic.life]:
If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:
the storage for the new object exactly overlays the storage location which the original object occupied,
and
the new object is of the same type as the original object (ignoring the top-level cv-qualifiers), and
the type of the original object is not const-qualified, and, if a class type, does not contain any non-static data member whose type is const-qualified or a reference type, and
the original object was a most derived object of type T and the new object is a most derived object of type T (that is, they are not base class subobjects).
Your quote from the Standard is merely a consequence of this one.
Your proposed "devirtualization counter-example" does not meet these requirements, therefore all attempts to access the object after it is replaced will cause undefined behavior.
The blog post even pointed this out, in the very next sentence after the example code you looked at.
Is it possible to use placement-new to change the type of a polymorphic object? if not, what exactly in the standard prohibits it?
Consider this code:
#include <new>
struct Animal {
Animal();
virtual ~Animal();
virtual void breathe();
void kill();
void *data;
};
struct Dead: Animal {
void breathe() override;
};
void Animal::kill() {
this->~Animal();
new(this) Dead;
}
Is calling "kill" ever legal?
Update: Early comments do not address the (il)legality according to the standard of the programming technique shown here of changing the type of an object by explicit call to destructor and applying placement-new for a new object compatible.
Because there is interest in why would anyone would like to do this, I can mention the use case that led me to the question, although it is not relevant to the question I am asking.
Imagine you have a polymorphic type hierarchy with several virtual methods. During the lifetime of the object, things happen that can be modeled in the code as the object changing its type. There are many perfectly legal ways to program this, for example, keeping the object as a pointer, smart or not, and swapping in a copy of the other type. But this may be expensive: One has to clone or move the original object into another one of a different type just to swap the new in.
In GCC, Clang, and others, changing the type of an object can be as cheap as to simply changing the virtual table pointer, but in portable C++ this is not possible except by constructing in-place an object of a new type.
In my original use case, the object could not be held as a pointer either.
I would like to know what the standard says on the subject of reusing memory.
[basic.life]/8 If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:
...
(8.4) — the original object was a most derived object (1.8) of type T and the new object is a most derived object of type T (that is, they are not base class subobjects).
In your example, a call to kill() itself may be valid (if it so happens that sizeof(Animal)==sizeof(Dead), which I don't think is guaranteed), but most attempts to use Animal* pointer or Animal& lvalue through which that call was made would trigger undefined behavior by way of accessing the object whose lifetime has ended. Even assuming that the stars align and Animal subobject of Dead perfectly overlays the original location of the original stand-alone Animal object, such a pointer or lvalue is not considered to refer to the former, but to the now-expired latter.
I am exploring the possibility of implementing true (partially) immutable data structures in C++. As C++ does not seem to distinguish between a variable and the object that variable stores, the only way to truly replace the object (without assignment operation!) is to use placement new:
auto var = Immutable(state0);
// the following is illegal as it requires assignment to
// an immutable object
var = Immutable(state1);
// however, the following would work as it constructs a new object
// in place of the old one
new (&var) Immutable(state1);
Assuming that there is no non-trivial destructor to run, is this legal in C++ or should I expect undefined behaviour? If its standard-dependant, which is the minimal/maximal standard version where I can expect this to work?
Addendum: since it seems people still read this in 2019, a quick note — this pattern is actually legally possible in modern (post 17) C++ using std::launder().
What you wrote is technically legal but almost certainly useless.
Suppose
struct Immutable {
const int x;
Immutable(int val):x(val) {}
};
for our really simple immutable type.
auto var = Immutable(0);
::new (&var) Immutable(1);
this is perfectly legal.
And useless, because you cannot use var to refer to the state of the Immutable(1) you stored within it after the placement new. Any such access is undefined behavior.
You can do this:
auto var = Immutable(0);
auto* pvar1 = ::new (&var) Immutable(1);
and access to *pvar1 is legal. You can even do:
auto var = Immutable(0);
auto& var1 = *(::new (&var) Immutable(1));
but under no circumstance may you ever refer to var after you placement new'd over it.
Actual const data in C++ is a promise to the compiler that you'll never, ever change the value. This is in comparison to references to const or pointers to const, which is just a suggestion that you won't modify the data.
Members of structures declared const are "actually const". The compiler will presume they are never modified, and won't bother to prove it.
You creating a new instance in the spot where an old one was in effect violates this assumption.
You are permitted to do this, but you cannot use the old names or pointers to refer to it. C++ lets you shoot yourself in the foot. Go right ahead, we dare you.
This is why this technique is legal, but almost completely useless. A good optimizer with static single assignment already knows that you would stop using var at that point, and creating
auto var1 = Immutable(1);
it could very well reuse the storage.
Caling placement new on top of another variable is usually defined behaviour. It is usually a bad idea, and it is fragile.
Doing so ends the lifetime of the old object without calling the destructor. References and pointers to and the name of the old object refer to the new one if some specific assumptions hold (exact same type, no const problems).
Modifying data declared const, or a class containing const fields, results in undefined behaviour at the drop of a pin. This includes ending the lifetime of an automatic storage field declared const and creating a new object at that location. The old names and pointers and references are not safe to use.
[Basic.life 3.8]/8:
If, after the lifetime of an object has ended and before the storage which the object occupied is reused or
released, a new object is created at the storage location which the original object occupied, a pointer that
pointed to the original object, a reference that referred to the original object, or the name of the original
object will automatically refer to the new object and, once the lifetime of the new object has started, can be
used to manipulate the new object, if:
(8.1)
the storage for the new object exactly overlays the storage location which the original object occupied,
and
(8.2)
the new object is of the same type as the original object (ignoring the top-level cv-qualifiers), and
(8.3)
the type of the original object is not const-qualified, and, if a class type, does not contain any non-static
data member whose type is const-qualified or a reference type, and
(8.4)
the original object was a most derived object (1.8) of type
T
and the new object is a most derived
object of type
T
(that is, they are not base class subobjects).
In short, if your immutability is encoded via const members, using the old name or pointers to the old content is undefined behavior.
You may use the return value of placement new to refer to the new object, and nothing else.
Exception possibilities make it extremely difficult to prevent code that exdcutes undefined behaviour or has to summarially exit.
If you want reference semantics, either use a smart pointer to a const object or an optional const object. Both handle object lifetime. The first requires heap allocation but permits move (and possibly shared references), the second permits automatic storage. Both move manual object lifetime management out of business logic. Now, both are nullable, but avoiding that robustly is difficult doing it manually anyhow.
Also consider copy on write pointers that permit logically const data with mutation for efficiency purposes.
From the C++ standard draft N4296:
3.8 Object lifetime
[...]
The lifetime of an object of type T ends when:
(1.3) — if T is a class
type with a non-trivial destructor (12.4), the destructor call starts,
or
(1.4) — the storage which the object occupies is reused or
released.
[...]
4 A program may end the lifetime of any object by
reusing the storage which the object occupies or by explicitly calling
the destructor for an object of a class type with a non-trivial
destructor. For an object of a class type with a non-trivial
destructor, the program is not required to call the destructor
explicitly before the storage which the object occupies is reused or
released; however, if there is no explicit call to the destructor or
if a delete-expression (5.3.5) is not used to release the storage, the
destructor shall not be implicitly called and any program that depends
on the side effects produced by the destructor has undefined behavior.
So yes, you can end the lifetime of an object by reusing its memory, even of one with non-trivial destructor, as long as you don't depend on the side effects of the destructor call.
This applies when you have non-const instances of objects like struct ImmutableBounds { const void* start; const void* end; }
You've actually asked 3 different questions :)
1. The contract of immutability
It's just that - a contract, not a language construct.
In Java for instance, instances of String class are immutable. But that means that all methods of the class have been designed to return new instances of class rather than modifying the instance.
So if you would like to make Java's String into a mutable object, you couldn't, without having access to its source code.
Same applies to classes written in C++, or any other language. You have an option to create a wrapper (or use a Proxy pattern), but that's it.
2. Using placement constructor and allocating into an initialized are off memory.
That's actually what they were created to do in the first place.
The most common use case for the placement constructor are memory pools - you preallocate a large memory buffer, and then you allocate your stuff into it.
So yes - it is legal, and nobody won't mind.
3. Overwriting class instance's contents using a placement allocator.
Don't do that.
There's a special construct that handles this type of operation, and it's called a copy constructor.
Say, the code
class Derived: public Base {....}
Base* b_ptr = new( malloc(sizeof(Derived)) ) Base(1);
b_ptr->f(2);
Derived* d_ptr = new(b_ptr) Derived(3);
b_ptr->g(4);
d_ptr->f(5);
seems to be reasonable and LSP is satisfied.
I suspect that this code is standard-allowed when Base and Derived are POD, and disallowed otherwise (because vtbl ptr is overwritten). The first part of my question is: please point out precise precondition of such an overwrite.
There may exist other standard-allowed ways to write over.
The second part of my question is: is there any other ways? What are their precise preconditions?
UPDATE: I do NOT want to write code like this; I am interested in theoretical possiblity (or impossibiliy) of such a code. So, this is "standard nazi" question, not a question "how can I ...". (Has my question to be moved to other stackoverflow site?)
UPDATE2&4: What about destructors? Supposed semantics of this code is "Base instance is (destructively) updated by slice of Derived instance". Let us assume, for the sake of simplicity, that Base class has a trivial destructor.
UPDATE3: The most intersting for me is the validity of access via b_ptr->g(4)
You really need to do b_ptr = d_ptr after placement-new of Derived, in case the Base subobject isn't first in the layout of Derived. As written, b_ptr->g(4) evokes undefined behavior.
The rule (3.8 basic.life):
If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can
be used to manipulate the new object, if:
the storage for the new object exactly overlays the storage location which the original object occupied,
and
the new object is of the same type as the original object (ignoring the top-level cv-qualifiers), and
the type of the original object is not const-qualified, and, if a class type, does not contain any non-static
data member whose type is const-qualified or a reference type, and
the original object was a most derived object (1.8) of type T and the new object is a most derived object of type T (that is, they are not base class subobjects).
You also should probably be destroying the old object before reusing its memory, but the standard doesn't mandate this. However failure to do so will leak any resources owned by the old object. The full rule is given in section 3.8 (basic.life) of the Standard:
A program may end the lifetime of any object by reusing the storage which the object occupies or by explicitly calling the destructor for an object of a class type with a non-trivial destructor. For an object of a class type with a non-trivial destructor, the program is not required to call the destructor explicitly before the storage which the object occupies is reused or released; however, if there is no explicit call to
the destructor or if a delete-expression (5.3.5) is not used to release the storage, the destructor shall not be implicitly called and any program that depends on the side effects produced by the destructor has undefined behavior.
You are initializing the same piece of memory twice. That won't end well.
Suppose for example that the Base constructor allocates some memory and stores it in a pointer. The second time through the constructor, the first pointer will be overwritten and the memory leaked.
I think the overwrite is allowed.
If it was me I'd probably call Base::~Base before reusing the storage, so that the lifetime of the original object ended cleanly. But the standard explicitly allows you to reuse the storage without calling the destructor.
I don't believe your access via b_ptr is valid. The lifetime of the Base object has ended.
(See 3.8/4 in either standard for the rules on lifetime.)
And I'm also not entirely convinced that b_ptr must give the same address as was originally returned by the malloc() call.
If you write this code more cleanly, it is easier to see what goes wrong:
void * addr = std::malloc(LARGE_NUMBER);
Base * b = new (addr) Base;
b->foo(); // no problem
Derived * d = new (addr) Derived;
d->bar(); // also fine (#1)
b->foo(); // Error! b no longer points to a Base!
static_cast<Base*>(d)->foo(); // OK
b = d; b->foo(); // also OK
The problem is that at the line marked (#1), b and d point to entirely separate, unrelated things, and since you overwrote the memory of the erstwhile object *b, b is in fact no longer valid at all.
You may have some misguided thoughts about Base* and Derived* being convertible pointer types, but that has nothing to do with the present situation, and for the sake of this example, the two types may as well be entirely unrelated. It is only one the very last two lines that we use the fact that the Derived* is convertible to a Base*, when we perform the actual conversion. But note that this conversion is a genuine value conversion, and d is not the same pointer as static_cast<Base*>(d) (at least as far as the language is concerned).
Finally, let's clean up this mess:
d->~Derived();
std::free(addr);
The opportunity to destroy the original *b has passed, so we may have leaked that.