Dynamic allocation of class array with protected destructor

Dynamic allocation of class array with protected destructor - c++

If I have a class defined like
class A {
protected:
~A(){ }
};
then I can dynamically allocate the individual as well as array of objects like
A* ptr1 = new A;
A* ptr2 = new A[10];
However when I define the constructor for this class
class A {
public:
A(){}
protected:
~A(){ }
};
then I can create individual objects with
A* ptr = new A;
but when I try to dynamically allocate the array of object with
A* ptr = new A[10];
compiler(gcc-5.1 and Visual Studio 2015) starts complaining that A::~A() is inaccessible.
Can anyone explain about:-
1- Why is the difference in behavior with constructor being defined and not defined.
2- When the constructor is defined why I am allowed to create individual object and not array of object.

Rejecting an array-new with a protected destructor is correct, as per C++11, §5.3.4 ¶17:
If the new-expression creates an object or an array of objects of class type, access and ambiguity control are done for the allocation function, the deallocation function (12.5), and the constructor (12.1). If the new expression creates an array of objects of class type, access and ambiguity control are done for the destructor (12.4).
(emphasis added; almost exactly the same wording is used in C++03, §5.3.4 ¶16; C++14 moves some stuff around, but that doesn't seem to change the core of the issue - see #Baum mit Augen's answer)
That comes from the fact that new[] succeeds only if only all the elements have been constructed, and wants to avoid leaking objects in case one of the costructor calls fails; thus, if it manages to construct - say - the first 9 objects but the 10th fails with an exception, it has to destruct the first 9 before propagating the exception.
Notice that this restriction logically wouldn't be required if the constructor was declared as noexcept, but still the standard doesn't seem to have any exception in this regard.
So, here gcc is technically wrong in the first case, which, as far as the standard is concerned, should be rejected as well, although I'd argue that "morally" gcc does the right thing (as in practice there's no way that the default constructor of A can ever throw).

As it turns out, gcc is not correct here. In N4141 (C++14), we have:
If the
new-expression creates an array of objects of class type, the destructor is potentially invoked (12.4).
(5.3.4/19 [expr.new]) and
A
program is ill-formed if a destructor that is potentially invoked is deleted or not accessible from the context
of the invocation.
(12.4/11 [class.dtor]). So both array cases should be rejected. (Clang does get that right, live.)
The reason for that is, as mentioned by others and by my old, incorrect answer, that the construction of elements of class type can potentially fail with an exception. When that happens, the destructor of all fully constructed elements must be invoked, and thus the destructor must be accessible.
That limitation does not apply when allocating a single element with operator new (without the []), because there can be no fully constructed instance of the class if the single constructor call fails.

I'm not a language lawyer (very familiar with the standard), but suspect the answer is along the lines of that given earlier by Baum mit Augen (deleted, so only those with sufficient reputation can see it).
If the construction of subsequent array elements fails and throws an exception, then the already constructed elements will need to be deleted, requiring access to the destructor.
However, if the constructor is noexcept, this can be ruled out and access to the destructor is not required. The fact that gcc and clang both still complain even in this case, may well be a compiler bug. That is, the compiler fails to take into account that the constructor is noexcept. Alternatively, the compilers may be within the standard, in which case, this smells like a defect in the standard.

Related

Missed Optimization: std::vector<T>::pop_back() not qualifying destructor call?

In an std::vector<T> the vector owns the allocated storage and it constructs Ts and destructs Ts. Regardless of T's class hierarchy, std::vector<T> knows that it has only created a T and thus when .pop_back() is called it only has to destroy a T (not some derived class of T). Take the following code:
#include <vector>
struct Bar {
virtual ~Bar() noexcept = default;
};
struct FooOpen : Bar {
int a;
};
struct FooFinal final : Bar {
int a;
};
void popEm(std::vector<FooOpen>& v) {
v.pop_back();
}
void popEm(std::vector<FooFinal>& v) {
v.pop_back();
}
https://godbolt.org/z/G5ceGe6rq
The PopEm for FooFinal simply just reduces the vector's size by 1 (element). This makes sense. But PopEm for FooOpen calls the virtual destructor that the class got by extending Bar. Given that FooOpen is not final, if a normal delete fooOpen was called on a FooOpen* pointer, it would need to do the virtual destructor, but in the case of std::vector it knows that it only made a FooOpen and no derived class of it was constructed. Therefore, couldn't std::vector<FooOpen> treat the class as final and omit the call to the virtual destructor on the pop_back()?

Long story short - compiler doesn't have enough context information to deduce it https://godbolt.org/z/roq7sYdvT
Boring part:
The results are similar for all 3: msvc, clang, and gcc, so I guess the problem is general.
I analysed the libstdc++ code just to find pop_back() runs like this:
void pop_back() // a bit more convoluted but boils-down to this
{
--back;
back->~T();
}
Not any surprise. It's like in C++ textbooks. But it shows the problem - virtual call to a destructor from a pointer.
What we're looking for is the 'devirtualisation' technique described here: Is final used for optimisation in C++ - it states devirtualisation is 'as-if' behaviour, so it looks like it is open for optimisation if the compiler has enough information to do it.
My opinion:
I meddled with the code a little and i think optimisation doesn't happen because the compiler cannot deduce the only objects pointed by "back" are FooOpen instances. We - humans - know it because we analyse the entire class, and see the overall concept of storing the elements in a vector. We know the pointer must point to FooOpen instance only, but compiler fails to see it - it only sees a pointer which can point anywhere (vector allocates uninitialized chunk of memory and its interpretation is a part of vector's logic, also the pointer is modified outside the scope of pop_back()). Without knowing the entire concept of vector<> i don't think of how it can be deduced (without analysing the entire class) that it won't point to any descendant of FooOpen which can be defined in other translation units.
FooFinal doesn't have this problem because it already guarantees no other class can inherit from it so devirtualisation is safe for objects pointed by FooFinal* or FooFinal&.
Update
I made several findings which may be useful:
https://godbolt.org/z/3a1bvax4o), devirtualisation can occur for non-final classes as long as there is no pointer arithmetic involved.
https://godbolt.org/z/xTdshfK7v std::array performs devirtualisation on non-final classes. std::vector fails to do it even if it is constructed and destroyed in the same scope.
https://godbolt.org/z/GvoaKc9Kz devirtualisation can be enabled using wrapper.
https://godbolt.org/z/bTosvG658 destructor devirtualisation can be enabled with allocator. Bit hacky, but is transparent to the user. Briefly tested.

Yes, this is a missed optimisation.
Remember that a compiler is a software project, where features have to be written to exist. It may be that the relative overhead of virtual destruction in cases like this is low enough that adding this in hasn't been a priority for the gcc team so far.
It is an open-source project, so you could submit a patch that adds this in.

It feels a lot like § 11.4.7 (14) gives some insight into this. As of latest working draft (N4910 Post-Winter 2022 C++ working draft, Mar. 2022):
After executing the body of the destructor and destroying any objects with automatic storage duration
allocated within the body, a destructor for class X calls the destructors for X’s direct non-variant non-static data
members, the destructors for X’s non-virtual direct base classes and, if X is the most derived class (11.9.3), its
destructor calls the destructors for X’s virtual base classes. All destructors are called as if they were referenced
with a qualified name, that is, ignoring any possible virtual overriding destructors in more derived classes.
Bases and members are destroyed in the reverse order of the completion of their constructor (see 11.9.3).
[Note 4 : A return statement (8.7.4) in a destructor might not directly return to the caller; before transferring control
to the caller, the destructors for the members and bases are called. — end note]
Destructors for elements of an array are called in reverse order of their construction (see 11.9).
Also interesting for this topic, § 11.4.6, (17):
In an explicit destructor call, the destructor is specified by a ~ followed by a type-name or decltype-specifier
that denotes the destructor’s class type. The invocation of a destructor is subject to the usual rules for
member functions (11.4.2); that is, if the object is not of the destructor’s class type and not of a class derived
from the destructor’s class type (including when the destructor is invoked via a null pointer value), the program has undefined behavior.
So, as far as the standard cares, the invocation of a destructor is subject to the usual rules for member functions.
This, to me, sounds a lot like destructor calls do so much that compilers are likely unable to determine, at compile-time, that a destructor call does "nothing" - as it also calls destructors of members, and std::vector doesn't know this.

Construction vs Initialisation Formal difference

I am learning C++ using the books listed here. Now I came across the following statement from C++ Primer:
When we allocate a block of memory, we often plan to construct objects in that
memory as needed. In this case, we’d like to decouple memory allocation from object
construction.
Combining initialization with allocation is usually what we want when we
allocate a single object. In that case, we almost certainly know the value the object
should have.
(emphasis mine)
The important thing to note here is that C++ primer seems to suggest that construction is the same as initialization and that they are different from allocation which makes sense to me.
Note that I've just quoted selected parts from the complete paragraph to keep the discussion concise and get my point across. You can read the complete para if you want here.
Now, I came across the following statement from class.dtor:
For an object with a non-trivial constructor, referring to any non-static member or base class of the object before the constructor begins execution results in undefined behavior. For an object with a non-trivial destructor, referring to any non-static member or base class of the object after the destructor finishes execution results in undefined behavior.
(emphasis mine)
Now does the standard specifies exactly when(at what point) the constructor execution begins?
To give you some more context consider the example:
class A {
public:
A(int)
{
}
};
class B : public A {
int j;
public:
int f()
{
return 4;
}
//------v-----------------> #2
B() : A(f()),
//------------^-----------> #3
j(f())
//------------^-----------> #4
{ //<---------------#5
}
};
int main()
{
B b; #1
return 0;
}
My questions are:
At what point does the derived class' constructor B::B() start executing according to the standard? Note that I know that A(f()) is undefined behavior. Does B::B() starts executing at point #1, #2, #3, #4 or #5. In other words, does the standard specifies exactly when(at what point) the constructor execution begins?
Is construction and initialization the same in this given example. I mean I understand that in the member initializer list where we have j(f()), we're initializing data member j but does this initialization also implies that the construction B::B() has begun executing at point #4?
I read in a recent SO post that execution of derived ctor begins at point #4 and so that post also seem to suggest that Initialisation and construction is the same.
I read many posts before asking this question but I wasn't able to come up with an answer that is right according to the C++ standard.
I also read this which seems to suggest that allocation, initialization and construction are all different:
Allocation
This is the step where memory is allocated for the object.
Initialization
This is the step where the language related object properties are "set". The vTable and any other "language implementation" related operations are done.
Construction
Now that an object is allocated and initialized, the constructor is being executed. Whether the default constructor is used or not depends on how the object was created.
As you can see above, the user claims that all of the mentioned terms are different as opposed to what is suggested by C++ primer. So which claim is correct here according to the standard, C++ Primer(which says that construction and Initialisation is same) or the above quoted quoted answer(what says that construction and Initialisation are different).

Initialization and construction are somewhat similar, but not the same.
For objects of class types, initialization is done by calling a constructor (there are exceptions, e.g. aggregate initialization doesn't use a constructor; and value-initialization zeros the members in before calling a constructor).
Non-class types don't have constructors, so they are initialized without them.
Your C++ Primer quote uses "initialization" and "construction" to refer to the same thing. It would be more correct to call it "initialization" to not limit yourself to class types, and to include other parts of initialization other than a constructor call.
does the standard specifies exactly when(at what point) the constructor execution begins?
Yes. B b; is default-initialization, which, for classes, calls the default constructor and does nothing else.
So the default-constructor B() is the first thing that's executed here.
As described here, B() calls A() after evaluating its argument f(), then initializes j after evaluating its initializer f(). Finally, it executes its own body, which is empty in your case.
Since the body is executed last, it's a common misunderstanding to think that B() itself is executed after A() and/or after initializing j.
I read in a recent SO post that execution of derived ctor begins at point #4
You should also read my comments to that post, challenging this statement.

There is no distinction between point 1 and 2 in your example. It's the same point. An object's constructor gets invoked when the object gets instantiated, when it comes into existence. Whether it's point 1 or point 2, that's immaterial.
What you are separating out here is the allocation of the underlying memory for the object (point 1) and when the new object's constructor begins executing (point 2).
But that's a distinction without a difference. These are not two discrete events in C++ that somehow can be carried out as discrete, separate steps. They are indivisible, they are one and the same. You cannot allocate memory for a C++ object but somehow avoid constructing it. Similarly you cannot construct some object without allocating memory for it first. It's the same thing.
Now, you do have other distractions that can happen here, like employing the services of the placement-new operator (and manually invoking the object's destructor at some point later down the road). That seems to suggest that allocation is something that's separate, but its really not. The invocation of the placement new operator is, effectively, implicitly allocating the memory for the new object from the pointer you hand over to the placement new operator, then this is immediately followed by the object's construction. That's the way to think about it. So allocation+construction is still an indivisible step.
Another important point to understand is that the first thing that every object's constructor does is call the constructors of all objects that it's derived from. So, from a practical aspect, the very first thing that actually happens in the shown code is that A's constructor gets called first, to construct it, then, once its construction is finished, then B's "real" construction takes place.
B() : A(f()),
That's exactly what this reads. B's constructor starts executing, and its first order of business is to call A's constructor. B does nothing else, until A's constructor finishes. So, technically, B's constructor starts executing first, then A's constructor. But, B's constructor does nothing until A's constructor handles its business.
The first thing that B's constructor does is call A's constructor. This is specified in the C++ standard.
B() : j(f()), A(f())
If you try to do this your compiler will yell at you.
So, to summarize: when you instantiate an object, any object, the very first thing that happens is its constructor gets called. That's it. End of story. Every possible variation here (placement new, PODs with no constructors, the additional song-and-dance routine with virtual dispatch) is all downstream of this, and can be defined as special, and specific, implementations of constructors.

Is it legal to construct data members of a struct separately?

class A;
class B;
//we have void *p pointing to enough free memory initially
std::pair<A,B> *pp=static_cast<std::pair<A,B> *>(p);
new (&pp->first) A(/*...*/);
new (&pp->second) B(/*...*/);
After the code above get executed, is *pp guaranteed to be in a valid state? I know the answer is true for every compiler I have tested, but the question is whether this is legal according to the standard and hence. In addition, is there any other way to obtain such a pair if A or B is not movable in C++98/03? (thanks to #StoryTeller-UnslanderMonica , there is a piecewise constructor for std::pair since C++11)

“Accessing” the members of the non-existent pair object is undefined behavior per [basic.life]/5; pair is never a POD-class (having user-declared constructors), so a pointer to its storage out of lifetime may not be used for its members. It’s not clear whether forming a pointer to the member is already undefined, or if the new is.
Neither is there a way to construct a pair of non-copyable (not movable, of course) types in C++98—that’s why the piecewise constructor was added along with move semantics.

A more simple question: is using a literal string well defined?
Not even that is, as its lifetime is not defined. You can't use a string literal in conforming code.
So the committee that never took the time to make string literals well defined obviously did not bother with specifying which objects of class type can be made to exist by placement new of its subobjects - polymorphic objects obviously cannot be created that way!
That standard did not even bother describing the semantic of union.
About lifetime the standard is all over the place, and that isn't just editorial: it reflects a deep disagreement between serious people about what begins a lifetime, what an object is, what an lvalue is, etc.
Notably people have all sorts of false or contradicting intuitions:
an infinite number of objects cannot be created by one call to malloc
an lvalue refers to an object
overlapping objects are against the object model
a unnamed object can only be created by new or by the compiler (temporaries)
...

Initialisation of vector of atomics

Consider:
void foo() {
std::vector<std::atomic<int>> foo(10);
...
}
Are the contents of foo now valid? Or do I need to explicitly loop through and initialise them? I have checked on Godbolt and it seems fine, however the standard seems to be very confused on this point.
The std::vector constructor says it inserts default-inserted instances of std::atomic<int>, which are value initialised via placement new.
I think this effect of value initialisation applies:
2) if T is a class type with a default constructor that is neither user-provided nor deleted (that is, it may be a class with an implicitly-defined or defaulted default constructor), the object is zero-initialized and then it is default-initialized if it has a non-trivial default constructor;
So it seems to me that the atomics are zero-initialised. So the question is, does zero-initialisation of a std::atomic<int> result in a valid object?
I'm going to guess that the answer is "yes in practice but it's not really defined"?
Note: This answer agrees that it is zero-initialised, but doesn't really say if that means that the object is valid.

You are correct to be worried. According to standard the atomics has the default constructor called, however they have not been initialized as such. This is because the default constructor doesn't initialize the atomic:
The default-initialized std::atomic<T> does not contain a T object,
and its only valid uses are destruction and initialization by
std::atomic_init
This is somewhat in violation of the normal language rules, and some implementations initialize anyway (as you have noted).
That being said, I would recommend taking the extra step to make 100% sure they are initialized correctly according to standard - after all you are dealing with concurrency where bugs can be extremely hard to track down.
There are many ways to dodge the issue, including using wrapper:
struct int_atomic {
std::atomic<int> atomic_{0};//use 'initializing' constructor
};

Even if the default constructor were called (it isn't, because it's trivial) it doesn't really do anything.
Zero-initialisation obviously cannot be guaranteed to produce a valid atomic; this'll only work if by chance a valid atomic is created by zero-initialising all its members.
And, since atomics aren't copyable, you can't provide a initialisation value in the vector constructor.
You should now loop over the container and std::atomic_init each element. If you need to lock around this, that's fine because you're already synchronising the vector's creation for the same reason.

Why can't classes with a destructor be memcpy'ed

The rules of C++ say that it's legal and will work to copy an object or a POD type using memcpy.
They further say that a POD can't have a (non-trivial) destructor. Why is this and why would the mere addition of a destructor change the class in such a way that using memcpy would not work?
// Perfectly fine to copy using memcpy
struct data
{
int something;
float thing;
};
// Not allowed to copy using memcpy
int var;
struct data
{
int something;
float thing;
~data() { var = 1; }
};
Why would simply adding the destructor make it impossible to memcpy the struct's data? I can't imagine that this would require altering the data layout in any way.
I'm not interested in being told don't do this, I have no intention to do so... I understand I can't do this because "the standard says so" but I'm wondering what the reason the standard says so is as it doesn't seem a necessary restriction to me and want to understand the reasons.
EDIT People seem to be misunderstanding my question. I'm not asking if it's a good idea or not to use memcpy. I'm asking what is the reasoning behind making it illegal if there is a non-trivial destructor. I can't see what difference it makes and want to understand why this restriction exists. Most of the reasons I've been given about it being a bad idea apply just as much if there is a destructor or not.

In layman's terms:
Why would simply adding the destructor make it impossible to memcpy the struct's data?
It doesn't make it impossible, just illegal.
I can't imagine that this would require altering the data layout in any way.
Probably won't, but it's allowed to. Because the class is no longer a POD (i.e. a c struct) it's now a c++ class.
Classes have different rules to PODs. Since we cannot predict how the compiler will go about coding them up, we can no longer reason about the outcome of memcpy.

Non-trivial destructors typically reverse some non-trivial action performed in a constructor (or other member functions) that affect the object state.
memcpy() copies the bits that make up the object. If the behavior of a constructor would give a different set of bits, then the destructor on that object will try to reverse some action that has not actually occurred.
A typical example is a class who's constructors allocate some resource (e.g. memory), other member functions ensure that resource remains in a sane state, and the destructor releases that resource. Copying such an object using memcpy() will copy the handle of that resource, rather than creating a new instance of that resource for the copied object [which is what the copy constructor of such an object typically does]. The destructor - eventually - will be invoked for both the original and copied objects, and the resource will be released twice. For memory (e.g. allocated using C's malloc() or C++'s operator new) releasing twice gives undefined behaviour. For other resources (file handles, mutexes, other system resources) the result varies, but - on must systems - it is generally inadvisable to deallocate a single something twice.
The other problem is that a class may have base classes, or have members, that themselves have non-trivial constructors and destructors. Even if the class itself has a constructor which does nothing, destroying an object invokes destructors of all members and bases. Copying such an object using memcpy() affects those base classes or members in the way I describe above.

Only objects which are trivially copyable can be copied using memcpy. A class with non-trivial destructor is not trivially copyable.
Suppose for an example, your class has a pointer as one of its member. You allocate space for that pointer in the constructor of the class. Now, you can do a variety of things in the non-trivial destructor like deleting the space you allocated. By memcpy you'll be copying the entire structure bit by bit. So two instances will be trying to delete the same pointer when their destructors are called.

This is because memcpy provides a shallow copy and if you have a non trivial dtor it is probably because your object is the owner of some resource, then a shallow copy would not provide you the right copy semantic (duplicate the ownership). Think about some structure with a pointer to something inside, the dtor should (probably) free the resource when the struct disappear, but a shallow copy will let you with a dangling pointer.

The problem is usually that when you have a destructor which does something, you also should have a copy constructor/assignment operator (look up "Rule of 3" for this).
When you memcpy, you will skip these copy operation and this can have some consequences.
E.g. you have a pointer to an object and delete it in the constructor. You then should also specify a copy operation so you copy the pointer/object there too. If you use memcpy instead you have 2 pointers to the same instance and the second destruction would cause an error.
The compilers cannot know what you do in your destructors and if special behavior is needed so it's pessimistic and is seen as a non-POD type anymore. (even if you do nothing in the destructor).
A similar thing happens with the generation of move-assignment/move-constructors when you declare a destructor in a class in c++11.

The problem arises when memory is owned by the class: you should not memcpy a class that owns memory (even if it has no destructor) take for instance:
https://ideone.com/46gFzw
#include <iostream>
#include <memory>
#include <cstring>
struct A
{
std::unique_ptr<int> up_myInt;
A(int val)
:
up_myInt(std::make_unique<int>(val))
{}
};
int main()
{
A a(1);
{
A b(2);
memcpy(&a, &b, sizeof(A));
std::cout << *a.up_myInt << std::endl;
//b gets deleted, and the memory b.up_myInt points to is gone
}
std::cout << *a.up_myInt << std::endl;
return 0;
}
which results in
stdout
2
0
stderr
*** Error in `./prog': double free or corruption (fasttop): 0x08433a20 ***
As b goes out of scope, the data it owned is deleted, a points to the same data, hence fun times (same happens if your class contains basically any other stl container so never ever memcpy an stl containter either.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js