Is a data race on vptr explicitly illegal? - c++

Before you go further, a note: this is purely a language lawyer question. I wish to get answers based on standard quotes. I am not looking for advice on writing C++ code. Please answer as if I was a compiler writer.
During construction of an object with only exclusive subobjects (#), notably those only non virtual bases (also those with a virtual base class named only once), the dynamic type of an lvalue referring to a base class subobject "increases": it goes from type of the base to type of the class of constructor running.
(#) A subobject is exclusive when it is a direct subobject of exactly one other object (which may be another subobject or a complete object). A member and a non virtual base are always exclusive.
During destruction, the type decreases (until the end of the body of the destructor of that subobject, where the subobject is gone and has no dynamic type anymore).
[During construction of an object with shared base class subobjects (that is in a class with distinct base subobjects with at least a virtual base), the dynamic type of a base subobject can "disappear" temporarily. I'm do not wish to discuss such classes here.]
The real question is: What happens if the dynamic type of the object is increased in another thread?
The title of the question, which is standard C++ question, is expressed using a non standard term (vptr), which may look contradicting. The reasons are:
There is no requirement that polymorphism is implemented in term of vptr, but it (almost?) always is. The one (or many) vptr in an object represent the dynamic type of a polymorphic object.
Data races are defined in term of read/write operations to a memory location.
The standard text often uses non standard elements "for exposition only" to define standard features. (So, why not use the vptr "for exposition only"?)
The standard does not define the behavior of polymorphic objects (*) directly as a function of their dynamic type; the standard specifies which expressions are allowed during the so-called "lifetime" (after the constructor has completed), inside the body of the constructor of the most derived type (exactly the same expressions are allowed with the same semantic), also inside the base class subobject constructors...
(*) Dynamic behavior of polymorphic or dynamic objects(**) include: virtual calls, derived to base conversions, down casts (static_cast or dynamic_cast), typeid of a polymorphic object.
(**) A dynamic object is one such that its class uses the virtual keyword; its constructor is not trivial for that reason.
So the description says: After something has finished, as soon as something started, before something else, etc. some expression is valid and does such and such.
The specification of construction and destruction was written before threads were part of standard C++. So what was the change with the standardization of threads? There is one sentence with defines threading behavior (the normative part) [basic.life]/11:
In this subclause, “before” and “after” refer to the “happens before”
relation ([intro.multithread]).
So it's clear that an object is seen as fully constructed iff there is an happen before relation between the completion of the invocation of the constructor and the use of the object, and also an happen before that use of the object and the invocation of the destructor (if it's invoked at all).
But it doesn't say what happens during the construction of derived classes, after a base class subobject has been constructed: obviously there is a race condition if any dynamic property is used for a polymorphic object under construction, but race conditions are not illegal.
[A race condition is a case of non-determinism, and any meaningful use of a mutex, condition variable, rwlocks, many uses of semaphores, many uses of other synchronisation devices, and all uses of atomic primitives introduce a race condition at least at the level of the modification order on the atomic object. Whether that low level non-determinism results on unpredictable high level behavior depends on the way the primitives are used.]
Then the standard draft goes on to say:
[ Note: Therefore, undefined behavior results if an object that is
being constructed in one thread is referenced from another thread
without adequate synchronization. — end note ]
Where is "adequate synchronization" defined?
Is the lack of "adequate synchronization" the moral equivalent of a regular data race: a data race on the vptr, or in standard speak, a data race on the dynamic type?
For simplicity, I wish to restrict the scope of the question to single inheritance, at least as a first step. (The standard is awfully confused about the construction of objects with multiple inheritance anyway.)
This is language lawyer question so I'm not interested in:
whether using an object that is in the process of being constructed in another thread is advisable (it's probably not advisable);
how to use synchronization to reliably fix that race condition;
whether compiler vendors wish to support such a use case (they probably do not and will not);
whether that could possibly work reliably in any real world implementation (it probably will not reliably work in non trivial cases with current implementation).
EDIT: The previous example, instead of illustrating the issue, was a distraction. It caused a very interesting but completely irrelevant discussion in the chat section.
Here is a cleaner example that will not cause the same issue:
atomic<Base1*> shared;
struct Base1 {
virtual void f() {}
};
struct Base2 : Base1 {
virtual void f() {}
Base2 () { shared = (Base1*)this; }
};
struct Der2 : Base2 {
virtual void f() {}
};
void use_shared() {
Base1 *p;
while (! (p = shared.get()));
p->f();
}
With the consumer/producer logic:
Thread A: new Der2;
Thread B: use_shared();
For reference, original example:
atomic<Base*> shared;
struct Base {
virtual void f() {}
Base () { shared = this; }
};
struct Der : Base {
virtual void f() {}
};
void use_shared() {
Base *p;
while (! (p = shared.get()));
p->f();
}
Consumer/producer logic:
Thread A: new Der;
Thread B: use_shared();
It wasn't clear that this could be used by another thread during the execution of Base constructor, which is an interesting issue but irrelevant to the issue of using a base class subobject while a derived constructor runs in another thread.
Additional information
For reference, the DR that "motivated" the current phrasing (although that explains nothing):
Core Language Defect Report #710

My reading of the standard is that there's a data race and therefore undefined behavior, but the standard addresses it very indirectly.
[basic.life]/1 The lifetime of an object of type T begins when ... its initialization is complete.
When shared = this; is executed, the lifetime of Base object, let alone Der, hasn't started yet.
[basic.life]/6 Before the lifetime of an object has started but after the storage which the object will occupy has been allocated ... any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways. For an object under construction or destruction, see [class.cdtor]. Otherwise ... [t]he program has undefined behavior if ... the pointer is used to access a non-static data member or call a non-static member function of the object.
[basic.life]/11 In this section, “before” and “after” refer to the “happens before” relation (4.7). [ Note: Therefore, undefined behavior results if an object that is being constructed in one thread is referenced from another thread without
adequate synchronization. —end note ]
So the default position of [basic.life] is that a call to an object's method that doesn't happen-after its initialization is completed exhibits undefined behavior. But [class.cdtor] may have more to say.
[class.cdtor]/3 Member functions, including virtual functions (13.3), can be called during construction or destruction (15.6.2). When a virtual function is called directly or indirectly from a constructor or from a destructor ...
Thus, [class.cdtor] only addresses the case where the virtual function is called directly or indirectly from the constructor (necessarily on the same thread on which the constructor itself runs). It's silent on the case where a method is called from another thread, as in the example. I take it to mean that [basic.life] controls, and the behavior of the example is undefined.

Related

Missed Optimization: std::vector<T>::pop_back() not qualifying destructor call?

In an std::vector<T> the vector owns the allocated storage and it constructs Ts and destructs Ts. Regardless of T's class hierarchy, std::vector<T> knows that it has only created a T and thus when .pop_back() is called it only has to destroy a T (not some derived class of T). Take the following code:
#include <vector>
struct Bar {
virtual ~Bar() noexcept = default;
};
struct FooOpen : Bar {
int a;
};
struct FooFinal final : Bar {
int a;
};
void popEm(std::vector<FooOpen>& v) {
v.pop_back();
}
void popEm(std::vector<FooFinal>& v) {
v.pop_back();
}
https://godbolt.org/z/G5ceGe6rq
The PopEm for FooFinal simply just reduces the vector's size by 1 (element). This makes sense. But PopEm for FooOpen calls the virtual destructor that the class got by extending Bar. Given that FooOpen is not final, if a normal delete fooOpen was called on a FooOpen* pointer, it would need to do the virtual destructor, but in the case of std::vector it knows that it only made a FooOpen and no derived class of it was constructed. Therefore, couldn't std::vector<FooOpen> treat the class as final and omit the call to the virtual destructor on the pop_back()?
Long story short - compiler doesn't have enough context information to deduce it https://godbolt.org/z/roq7sYdvT
Boring part:
The results are similar for all 3: msvc, clang, and gcc, so I guess the problem is general.
I analysed the libstdc++ code just to find pop_back() runs like this:
void pop_back() // a bit more convoluted but boils-down to this
{
--back;
back->~T();
}
Not any surprise. It's like in C++ textbooks. But it shows the problem - virtual call to a destructor from a pointer.
What we're looking for is the 'devirtualisation' technique described here: Is final used for optimisation in C++ - it states devirtualisation is 'as-if' behaviour, so it looks like it is open for optimisation if the compiler has enough information to do it.
My opinion:
I meddled with the code a little and i think optimisation doesn't happen because the compiler cannot deduce the only objects pointed by "back" are FooOpen instances. We - humans - know it because we analyse the entire class, and see the overall concept of storing the elements in a vector. We know the pointer must point to FooOpen instance only, but compiler fails to see it - it only sees a pointer which can point anywhere (vector allocates uninitialized chunk of memory and its interpretation is a part of vector's logic, also the pointer is modified outside the scope of pop_back()). Without knowing the entire concept of vector<> i don't think of how it can be deduced (without analysing the entire class) that it won't point to any descendant of FooOpen which can be defined in other translation units.
FooFinal doesn't have this problem because it already guarantees no other class can inherit from it so devirtualisation is safe for objects pointed by FooFinal* or FooFinal&.
Update
I made several findings which may be useful:
https://godbolt.org/z/3a1bvax4o), devirtualisation can occur for non-final classes as long as there is no pointer arithmetic involved.
https://godbolt.org/z/xTdshfK7v std::array performs devirtualisation on non-final classes. std::vector fails to do it even if it is constructed and destroyed in the same scope.
https://godbolt.org/z/GvoaKc9Kz devirtualisation can be enabled using wrapper.
https://godbolt.org/z/bTosvG658 destructor devirtualisation can be enabled with allocator. Bit hacky, but is transparent to the user. Briefly tested.
Yes, this is a missed optimisation.
Remember that a compiler is a software project, where features have to be written to exist. It may be that the relative overhead of virtual destruction in cases like this is low enough that adding this in hasn't been a priority for the gcc team so far.
It is an open-source project, so you could submit a patch that adds this in.
It feels a lot like § 11.4.7 (14) gives some insight into this. As of latest working draft (N4910 Post-Winter 2022 C++ working draft, Mar. 2022):
After executing the body of the destructor and destroying any objects with automatic storage duration
allocated within the body, a destructor for class X calls the destructors for X’s direct non-variant non-static data
members, the destructors for X’s non-virtual direct base classes and, if X is the most derived class (11.9.3), its
destructor calls the destructors for X’s virtual base classes. All destructors are called as if they were referenced
with a qualified name, that is, ignoring any possible virtual overriding destructors in more derived classes.
Bases and members are destroyed in the reverse order of the completion of their constructor (see 11.9.3).
[Note 4 : A return statement (8.7.4) in a destructor might not directly return to the caller; before transferring control
to the caller, the destructors for the members and bases are called. — end note]
Destructors for elements of an array are called in reverse order of their construction (see 11.9).
Also interesting for this topic, § 11.4.6, (17):
In an explicit destructor call, the destructor is specified by a ~ followed by a type-name or decltype-specifier
that denotes the destructor’s class type. The invocation of a destructor is subject to the usual rules for
member functions (11.4.2); that is, if the object is not of the destructor’s class type and not of a class derived
from the destructor’s class type (including when the destructor is invoked via a null pointer value), the program has undefined behavior.
So, as far as the standard cares, the invocation of a destructor is subject to the usual rules for member functions.
This, to me, sounds a lot like destructor calls do so much that compilers are likely unable to determine, at compile-time, that a destructor call does "nothing" - as it also calls destructors of members, and std::vector doesn't know this.

Is it undefined behavior to run a member function in a separate thread, in parallel to the type's constructor?

This is a scenario you shouldn't ever do, but https://timsong-cpp.github.io/cppwp/class.cdtor#4 states:
Member functions, including virtual functions ([class.virtual]), can be called during construction or destruction ([class.base.init]).
Does this hold if the functions are called in parallel? That is, ignoring the race condition, if the A is in the middle of construction, and frobme is called at some point AFTER the constructor is invoked (e.g. during construction), is that still defined behavior?
#include <thread>
struct A {
void frobme() {}
};
int main() {
char mem[sizeof(A)];
auto t1 = std::thread([mem]() mutable { new(mem) A; });
auto t2 = std::thread([mem]() mutable { reinterpret_cast<A*>(mem)->frobme(); });
t1.join();
t2.join();
}
As a separate scenario, it was also pointed out to me that it's possible for A's constructor to create multiple threads, where those those threads may invoke a member function function before A is finished construction, but the ordering of those operations would be more analyzable (you know no races will occur until AFTER the thread is generated in the constructor).
There are two issues here: your specific code and your general question.
In your specific code, even in the best possible case scenario (where t2 executes after t1), you have a data race due to the lack of synchronization between creation and use. And that makes your code UB regardless of the order of execution.
In the general question, let's assume that the constructor of a type hands the this pointer off to some other thread, which then calls functions on it, and the hand-off itself is properly synchronized. Would some other thread invoking a member function be considered a data race?
Well, it certainly would be a data race if the other thread invokes a function that reads member values or other data written by the constructor subsequent to the point of the hand-off, or if the constructor accesses members or other data written by the member function being invoked. That is, if there are no data races between the code being executed simultaneously.
Assuming that neither of those is the case, then everything should be fine (mostly. It's possible to define A in such a way that your reinterpret_cast doesn't return a usable pointer to the A you created in that storage; you'd need to launder it). An object under construction/destruction can be accessed, but only in certain ways. Stick to those ways, and you should be fine... with one possible catch.
There's nothing in the standard about data races on the completion of an object's initialization, only on conflicts in memory locations. Once the object is fully constructed, the behavior of virtual functions could change, based on changing vtable pointers and such if the dynamic type is a class derived from the class given to the other thread. I don't believe there's a clear statement about this in the section on the object model.
Also, note that C++20 added a special rule to class.cdtor:
During the construction of an object, if the value of the object or any of its subobjects is accessed through a glvalue that is not obtained, directly or indirectly, from the constructor's this pointer, the value of the object or subobject thus obtained is unspecified.
Besides the race condition (which you might be managing with mutexes or similar), you're subject to the usual limitations on an object whose lifetime has not yet started, namely:
Before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways.
See [basic.life] for the full list of operations that are and are not allowed.
In particular, one of the restrictions is that
The program has undefined behavior if:
...
the glvalue is used to call a non-static member function of the object
which clearly forbids your example.
Also [class.cdtor] says:
For an object with a non-trivial constructor, referring to any non-static member or base class of the object before the constructor begins execution results in undefined behavior
and even if you do synchronize to some event triggered after construction begins, this rule will forbid that code:
During the construction of an object, if the value of the object or any of its subobjects is accessed through a glvalue that is not obtained, directly or indirectly, from the constructor's this pointer, the value of the object or subobject thus obtained is unspecified

memcpy derived class to base class, why still called base class function

I am reading Inside the C++ Object Model. In section 1.3
So, then, why is it that, given
Bear b;
ZooAnimal za = b;
// ZooAnimal::rotate() invoked
za.rotate();
the instance of rotate() invoked is the ZooAnimal instance and not that of Bear? Moreover, if memberwise initialization copies the values of one object to another, why is za's vptr not addressing Bear's virtual table?
The answer to the second question is that the compiler intercedes in the initialization and assignment of one class object with another. The compiler must ensure that if an object contains one or more vptrs, those vptr values are not initialized or changed by the source object .
So I wrote the test code below:
#include <stdio.h>
class Base{
public:
virtual void vfunc() { puts("Base::vfunc()"); }
};
class Derived: public Base
{
public:
virtual void vfunc() { puts("Derived::vfunc()"); }
};
#include <string.h>
int main()
{
Derived d;
Base b_assign = d;
Base b_memcpy;
memcpy(&b_memcpy, &d, sizeof(Base));
b_assign.vfunc();
b_memcpy.vfunc();
printf("sizeof Base : %d\n", sizeof(Base));
Base &b_ref = d;
b_ref.vfunc();
printf("b_assign: %x; b_memcpy: %x; b_ref: %x\n",
*(int *)&b_assign,
*(int *)&b_memcpy,
*(int *)&b_ref);
return 0;
}
The result
Base::vfunc()
Base::vfunc()
sizeof Base : 4
Derived::vfunc()
b_assign: 80487b4; b_memcpy: 8048780; b_ref: 8048780
My question is why b_memcpy still called Base::vfunc()
What you are doing is illegal in C++ language, meaning that the behavior of your b_memcpy object is undefined. The latter means that any behavior is "correct" and your expectations are completely unfounded. There's not much point in trying to analyze undefined behavior - it is not supposed to follow any logic.
In practice, it is quite possible that your manipulations with memcpy did actually copy Derived's virtual table pointer to b_memcpy object. And your experiments with b_ref confirm that. However, when a virtual method is called though an immediate object (as is the case with b_memcpy.vfunc() call) most implementations optimize away the access to the virtual table and perform a direct (non-virtual) call to the target function. Formal rules of the language state that no legal action can ever make b_memcpy.vfunc() call to dispatch to anything other than Base::vfunc(), which is why the compiler can safely replace this call with a direct call to Base::vfunc(). This is why any virtual table manipulations will normally have no effect on b_memcpy.vfunc() call.
The behavior you've invoked is undefined because the standard says it's undefined, and your compiler takes advantage of that fact. Lets look at g++ for a concrete example. The assembly it generates for the line b_memcpy.vfunc(); with optimizations disabled looks like this:
lea rax, [rbp-48]
mov rdi, rax
call Base::vfunc()
As you can see, the vtable wasn't even referenced. Since the compiler knows the static type of b_memcpy it has no reason to dispatch that method call polymorphically. b_memcpy can't be anything other than a Base object, so it just generates a call to Base::vfunc() as it would with any other method call.
Going a bit further, lets add a function like this:
void callVfunc(Base& b)
{
b.vfunc();
}
Now if we call callVfunc(b_memcpy); we can see different results. Here we get a different result depending on the optimization level at which I compile the code. On -O0 and -O1 Derived::vfunc() is called and on -O2 and -O3 Base::vfunc() is printed. Again, since the standard says the behavior of your program is undefined, the compiler makes no effort to produce a predictable result, and simply relies on the assumptions made by the language. Since the compiler knows b_memcpy is a Base object, it can simply inline the call to puts("Base::vfunc()"); when the optimization level allows for it.
You aren't allowed to do
memcpy(&b_memcpy, &d, sizeof(Base));
- it's undefined behaviour, because b_memcpy and d aren't "plain old data" objects (because they have virtual member functions).
If you wrote:
b_memcpy = d;
then it would print Base::vfunc() as expected.
Any use of a vptr is outside the scope of the standard
Granted, the use of memcpy here has UB
The answers pointing out that any use of memcpy, or other byte manipulation of non-PODs, that is, of any object with a vptr, has undefined behavior, are strictly technically correct but do not answer the question. The question is predicated on the existence of a vptr (vtable pointer) which isn't even mandated by the standard: of course the answer will involve facts outside the standard and the result bill not be guaranteed by the standard!
Standard text is not relevant regarding the vptr
The issue is not that you are not allowed to manipulate the vptr; the notion of being allowed by the standard to manipulate anything that is not even described in the standard text is absurd. Of course not standard way to change the vptr will exist and this is beside the point.
The vptr encodes the type of a polymorphic object
The issue here is not what the standard says about the vptr, the issue is what the vptr represents, and what the standard says about that: the vptr represents the dynamic type of an object. Whenever the result of an operation depends on the dynamic type, the compiler will generate code to use the vptr.
[Note regarding MI: I say "the" vptr (as if the only one vptr), but when MI (multiple inheritance) is involved, objects can have more than one vptr, each representing the complete object viewed as a particular polymorphic base class type. (A polymorphic class is a class with a least one virtual function.)]
[Note regarding virtual bases: I mention only the vptr, but some compilers insert other pointers to represent aspects of the dynamic type, like the location of virtual base subobjects, and some other compilers use the vptr for that purpose. What is true about the vptr is also true about these other internal pointers.]
So a particular value of the vptr corresponds to a dynamic type: that is the type of most derived object.
Changes of the dynamic type of an object during its lifetime
During construction, the dynamic type changes, and that is why virtual function calls from inside the constructor can be "surprising". Some people say that the rules of calling virtual functions during construction are special, but they are absolutely not: the final overrider is called; that override is the one the class corresponding to the most derived object that has been constructed, and in a constructor C::C(arg-list), it is always the type of the class C.
During destruction, the dynamic type changes, in the reverse order. Calls to virtual function from inside destructors follow the same rules.
What it means when something is left undefined
You can do low level manipulations that are not sanctioned in the standard. That a behavior is not explicitly defined in the C++ standard does not imply that it is not described elsewhere. Just because the result of a manipulation is explicitly described has having UB (undefined behavior) in the C++ standard does not mean your implementation cannot define it.
You can also use your knowledge of the way the compilers work: if strict separate compilation is used, that is when the compiler can get no information from separately compiled code, every separately compiled function is a "black box". You can use this fact: the compiler will have to assume that anything that a separately compiled function could do will be done. Even with inside a given function, you can use asm directive to get the same effects: an asm directive with no constraint can do anything that is legal in C++. The effect is a "forget what you know from code analysis at that point" directive.
The standard describes what can change the dynamic type, and nothing is allowed to change it except construction/destruction, so only an "external" (blackbox) function is is otherwise allowed to perform construction/destruction can change a dynamic type.
Calling constructors on an existing object is not allowed, except to reconstruct it with the exact same type (and with restrictions) see [basic.life]/8 :
If, after the lifetime of an object has ended and before the storage
which the object occupied is reused or released, a new object is
created at the storage location which the original object occupied, a
pointer that pointed to the original object, a reference that referred
to the original object, or the name of the original object will
automatically refer to the new object and, once the lifetime of the
new object has started, can be used to manipulate the new object, if:
(8.1) the storage for the new object exactly overlays the storage
location which the original object occupied, and
(8.2) the new object is of the same type as the original object
(ignoring the top-level cv-qualifiers), and
(8.3) the type of the original object is not const-qualified, and, if
a class type, does not contain any non-static data member whose type
is const-qualified or a reference type, and
(8.4) the original object was a most derived object ([intro.object])
of type T and the new object is a most derived object of type T (that
is, they are not base class subobjects).
This means that the only case where you could call a constructor (with placement new) and still use the same expressions that used to designate the objects (its name, pointers to it, etc.) are those where the dynamic type would not change, so the vptr would still be the same.
On other words, if you want to overwrite the vptr using low level tricks, you could; but only if you write the same value.
On other words, don't try to hack the vptr.

Default to making classes either `final` or give them a virtual destructor?

Classes with non-virtual destructors are a source for bugs if they are used as a base class (if a pointer or reference to the base class is used to refer to an instance of a child class).
With the C++11 addition of a final class, I am wondering if it makes sense to set down the following rule:
Every class must fulfil one of these two properties:
be marked final (if it is not (yet) intended to be inherited from)
have a virtual destructor (if it is (or is intended to) be inherited from)
Probably there are cases were neither of these two options makes sense, but I guess they could be treated as exceptions that should be carefully documented.
The probably most common actual issue attributed to the lack of a virtual destructor is deletion of an object through a pointer to a base class:
struct Base { ~Base(); };
struct Derived : Base { ~Derived(); };
Base* b = new Derived();
delete b; // Undefined Behaviour
A virtual destructor also affects the selection of a deallocation function. The existence of a vtable also influences type_id and dynamic_cast.
If your class isn't use in those ways, there's no need for a virtual destructor. Note that this usage is not a property of a type, neither of type Base nor of type Derived. Inheritance makes such an error possible, while only using an implicit conversion. (With explicit conversions such as reinterpret_cast, similar problems are possible without inheritance.)
By using smart pointers, you can prevent this particular problem in many cases: unique_ptr-like types can restrict conversions to a base class for base classes with a virtual destructor (*). shared_ptr-like types can store a deleter suitable for deleting a shared_ptr<A> that points to a B even without virtual destructors.
(*) Although the current specification of std::unique_ptr doesn't contain such a check for the converting constructor template, it was restrained in an earlier draft, see LWG 854. Proposal N3974 introduces the checked_delete deleter, which also requires a virtual dtor for derived-to-base conversions. Basically, the idea is that you prevent conversions such as:
unique_checked_ptr<Base> p(new Derived); // error
unique_checked_ptr<Derived> d(new Derived); // fine
unique_checked_ptr<Base> b( std::move(d) ); // error
As N3974 suggests, this is a simple library extension; you can write your own version of checked_delete and combine it with std::unique_ptr.
Both suggestions in the OP can have performance drawbacks:
Mark a class as final
This prevents the Empty Base Optimization. If you have an empty class, its size must still be >= 1 byte. As a data member, it therefore occupies space. However, as a base class, it is allowed not to occupy a distinct region of memory of objects of the derived type. This is used e.g. to store allocators in StdLib containers.
C++20 has mitigated this with the introduction of [[no_unique_address]].
Have a virtual destructor
If the class doesn't already have a vtable, this introduces a vtable per class plus a vptr per object (if the compiler cannot eliminate it entirely). Destruction of objects can become more expensive, which can have an impact e.g. because it's no longer trivially destructible. Additionally, this prevents certain operations and restricts what can be done with that type: The lifetime of an object and its properties are linked to certain properties of the type such as trivially destructible.
final prevents extensions of a class via inheritance. While inheritance is typically one of the worst ways to extend an existing type (compared to free functions and aggregation), there are cases where inheritance is the most adequate solution. final restricts what can be done with the type; there should be a very compelling and fundamental reason why I should do that. One cannot typically imagine the ways others want to use your type.
T.C. points out an example from the StdLib: deriving from std::true_type and similarly, deriving from std::integral_constant (e.g. the placeholders). In metaprogramming, we're typically not concerned with polymorphism and dynamic storage duration. Public inheritance often just the simplest way to implement metafunctions. I do not know of any case where objects of metafunction type are allocated dynamically. If those objects are created at all, it's typically for tag dispatching, where you'd use temporaries.
As an alternative, I'd suggest using a static analyser tool. Whenever you derive publicly from a class without a virtual destructor, you could raise a warning of some sort. Note that there are various cases where you'd still want to derive publicly from some base class without a virtual destructor; e.g. DRY or simply separation of concerns. In those cases, the static analyser can typically be adjusted via comments or pragmas to ignore this occurrence of deriving from a class w/o virtual dtor. Of course, there need to be exceptions for external libraries such as the C++ Standard Library.
Even better, but more complicated is analysing when an object of class A w/o virtual dtor is deleted, where class B inherits from class A (the actual source of UB). This check is probably not reliable, though: The deletion can happen in a Translation Unit different to the TU where B is defined (to derive from A). They can even be in separate libraries.
The question that I usually ask myself, is whether an instance of the class may be deleted via its interface. If this is the case, I make it public and virtual. If this is not the case, I make it protected. A class only needs a virtual destructor if the destructor will be invoked through its interface polymorphically.
Well, to be strictly clear, it's only if the pointer is deleted or the object is destructed (through the base class pointer only) that the UB is invoked.
There could be some exceptions for cases where the API user cannot delete the object, but other than that, it's generally a wise rule to follow.

PODs and inheritance in C++11. Does the address of the struct == address of the first member?

(I've edited this question to avoid distractions. There is one core question which would need to be cleared up before any other question would make sense. Apologies to anybody whose answer now seems less relevant.)
Let's set up a specific example:
struct Base {
int i;
};
There are no virtual method, and there is no inheritance, and is generally a very dumb and simple object. Hence it's Plain Old Data (POD) and it falls back on a predictable layout. In particular:
Base b;
&b == reinterpret_cast<B*>&(b.i);
This is according to Wikipedia (which itself claims to reference the C++03 standard):
A pointer to a POD-struct object, suitably converted using a reinterpret cast, points to its initial member and vice versa, implying that there is no padding at the beginning of a POD-struct.[8]
Now let's consider inheritance:
struct Derived : public Base {
};
Again, there are no virtual methods, no virtual inheritance, and no multiple inheritance. Therefore this is POD also.
Question: Does this fact (Derived is POD in C++11) allow us to say that:
Derived d;
&d == reinterpret_cast<D*>&(d.i); // true on g++-4.6
If this is true, then the following would be well-defined:
Base *b = reinterpret_cast<Base*>(malloc(sizeof(Derived)));
free(b); // It will be freeing the same address, so this is OK
I'm not asking about new and delete here - it's easier to consider malloc and free. I'm just curious about the regulations about the layout of derived objects in simple cases like this, and where the initial non-static member of the base class is in a predictable location.
Is a Derived object supposed to be equivalent to:
struct Derived { // no inheritance
Base b; // it just contains it instead
};
with no padding beforehand?
You don't care about POD-ness, you care about standard-layout. Here's the definition, from the standard section 9 [class]:
A standard-layout class is a class that:
has no non-static data members of type non-standard-layout class (or array of such types) or reference,
has no virtual functions (10.3) and no virtual base classes (10.1),
has the same access control (Clause 11) for all non-static data members,
has no non-standard-layout base classes,
either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
has no base classes of the same type as the first non-static data member.
And the property you want is then guaranteed (section 9.2 [class.mem]):
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.
This is actually better than the old requirement, because the ability to reinterpret_cast isn't lost by adding non-trivial constructors and/or destructor.
Now let's move to your second question. The answer is not what you were hoping for.
Base *b = new Derived;
delete b;
is undefined behavior unless Base has a virtual destructor. See section 5.3.5 ([expr.delete])
In the first alternative (delete object), if the static type of the object to be deleted is different from its dynamic type, the static type shall be a base class of the dynamic type of the object to be deleted and the static type shall have a virtual destructor or the behavior is undefined.
Your earlier snippet using malloc and free is mostly correct. This will work:
Base *b = new (malloc(sizeof(Derived))) Derived;
free(b);
because the value of pointer b is the same as the address returned from placement new, which is in turn the same address returned from malloc.
Presumably your last bit of code is intended to say:
Base *b = new Derived;
delete b; // delete b, not d.
In that case, the short answer is that it remains undefined behavior. The fact that the class or struct in question is POD, standard layout or trivially copyable doesn't really change anything.
Yes, you're passing the right address, and yes, you and I know that in this case the dtor is pretty much a nop -- nonetheless, the pointer you're passing to delete has a different static type than dynamic type, and the static type does not have a virtual dtor. The standard is quite clear that this gives undefined behavior.
From a practical viewpoint, you can probably get away with the UB if you really insist -- chances are pretty good that there won't be any harmful side effects from what you're doing, at least with most typical compilers. Beware, however, that even at best the code is extremely fragile so seemingly trivial changes could break everything -- and even switching to a compiler with really heavy type checking and such could do so as well.
As far as your argument goes, the situation's pretty simple: it basically means the committee probably could make this defined behavior if they wanted to. As far as I know, however, it's never been proposed, and even if it had it would probably be a very low priority item -- it doesn't really add much, enable new styles of programming, etc.
This is meant as a supplement to Ben Voigt's answer', not a replacement.
You might think that this is all just a technicality. That the standard calling it 'undefined' is just a bit of semantic twaddle that has no real-world effects beyond allowing compiler writers to do silly things for no good reason. But this is not the case.
I could see desirable implementations in which:
Base *b = new Derived;
delete b;
Resulted in behavior that was quite bizarre. This is because storing the size of your allocated chunk of memory when it is known statically by the compiler is kind of silly. For example:
struct Base {
};
struct Derived {
int an_int;
};
In this case, when delete Base is called, the compiler has every reason (because of the rule you quoted at the beginning of your question) to believe that the size of the data pointed at is 1, not 4. If it, for example, implements a version of operator new that has a separate array in which 1 byte entities are all densely packed, and a different array in which 4 byte entities are all densely packed, it will end up assuming the Base * points to somewhere in the 1-byte entity array when in fact it points somewhere in the 4-byte entity array, and making all kinds of interesting errors for this reason.
I really wish operator delete had been defined to also take a size, and the compiler passed in either the statically known size if operator delete was called on an object with a non-virtual destructor, or the known size of the actual object being pointed at if it were being called as a result of a virtual destructor. Though this would likely have other ill effects and maybe isn't such a good idea (like if there are cases in which operator delete is called without a destructor having been called). But it would make the problem painfully obvious.
There is lots of discussion on irrelevant issues above. Yes, mainly for C compatibility there are a number of guarantees you can rely as long as you know what you are doing. All this is, however, irrelevant to your main question. The main question is: Is there any situation where an object can be deleted using a pointer type which doesn't match the dynamic type of the object and where the pointed to type doesn't have a virtual destructor. The answer is: no, there is not.
The logic for this can be derived from what the run-time system is supposed to do: it gets a pointer to an object and is asked to delete it. It would need to store information on how to call derived class destructors or about the amount of memory the object actually takes if this were to be defined. However, this would imply a possibly quite substantial cost in terms of used memory. For example, if the first member requires very strict alignment, e.g. to be aligned at an 8 byte boundary as is the case for double, adding a size would add an overhead of at least 8 bytes to allocate memory. Even though this might not sound too bad, it may mean that only one object instead of two or four fits into a cache line, reducing performance substantially.