When using C-libraries it might be appropopriate to derive a class from a C-structure and add some methods to operate on it without any data-members. F.e. you could add a constructor to initialize the members more conveniently. So this objects might be implicitly upcasted and passed to the C-APIs.
There might be cases where the API expects an array of the C-structures. But is there any guarantee of the C++-language that the derived objectds have the same size as the base-struct so that the distances between the objects are properly offsetted?
BTW: None of the suggestions of similar questions matches my question.
In general, there is no such guarantee. And in particular, if you introduce virtual member functions for example, then there would typically be additional memory used for the virtual table pointer.
If we add an additional assumption that the derived class is standard layout, and no non-standard features such as "packing" is used, then the size would be the same in practice.
However, even if the size is the same, you technically cannot pretend that an array of derived type is an array of base type. In particular, iterating the "pretended" array with a pointer to base would have undefined behaviour. At least that's how it is within C++. Those operations are presumably performed in C across the API. I really don't know what guarantees there are in that case.
I would recommend that if you need to deal with arrays of the C struct (i.e. the pointer would be incremented or subscripted by the API), then instead of wrapping the individual struct, create a C++ wrapper for the entire array.
But is there any guarantee of the C++-language that the derived objectds have the same size as the base-stuct
In general I would expect, that the class will not add any additional to the class memory layout, as long you did not introduce new data members or virtual functions. Use of virtual functions results in adding the v-table pointer.
The implementation is also free to add a v-table pointer if you use virtual inheritance. This will also change the layout for most compilers ( clang and c++ use a vtable in that case! )
But this all is implementation specific and I did not know of a guarantee in the C++ standards which defines that the class layout will guarantee that you can use a derived class without a cast operation as the base class.
You also have to think of padding of data structures which may be different for the derived class.
Generate something ( the derived class ) and use it as something different ( the base struct ) is in general undefined behavior. We are not talking of cast operations! If you cast the derived class to the base class, everything is fine. But packing many instances into a derived class array and simply use it as a base class array is undefined behavior.
Related
Lets say I have a class
struct Foo {
uint32_t x;
uint32_t y;
};
does the C++ standard make any mention whether sizeof(Foo) should be the same as sizeof(Bar) if Bar just adds a constructor ?
struct Bar {
uint32_t x;
uint32_t y;
Bar(uint32_t a = 1,uint32_t b = 2) : x(a),y(b) {}
};
The reason why I am asking is that Foo is send across network as a void* and I cannot change its size, but if possible I would like to add a constructor.
I found some related: here and here, but there the answers focus mainly on virtuals changing the size and I was looking for something more definite than "[...] all implementations I know of [...]".
PS: To avoid misunderstanding... I am not asking how to constsruct a Foo and then send it as a void* nor am I asking for a workaround to make sure the size does not change, but I am really just curious, whether the standard says anything about sizeof in that specific case.
C++ 98 only guarantees layout for “plain old data” objects and those don't permit constructors.
C++ 11 introduces “standard layout types”, which still guarantee layout, but do permit constructors and methods to be added (and permits non-virtual bases to be added with some exceptions for empty classes and duplicates).
Actually, the only thing that influences the layout is the data contained in an object -- with one important exception, coming to that later. If you add any functions (and constructors in reality are nothing more than some kind of static function only with special syntax), you do not influence the layout of the class.
If you have a virtual class (a function with at least one virtual function, including a virtual destructor), your class will contain an entry to a vtable (this is not enforced by the standard, but that's the standard way how polymorphism is implemented), but this is just a pointer to some specific memory location elsewhere. The vtable itself will be modified, if you add more virtual functions, but without any influence on the layout of your data containers (which your class instances actually are).
Now the exception mentioned above: If you add a virtual function to a class (or make an existing one virtual, including the destructor) while the class did not have any virtual functions before (i. e. no own virtual functions and no inherited ones!), then a vtable will be newly added and then the data layout does change! (Analogously, the vtable pointer is removed if you make all functions non-virtual - including all inherited ones).
Guaranteed by the standard?
Edit:
From C++ standard, section 4.5 The C++ object model (§ 1):
[...] Note: A function is not an object, regardless of whether or not it occupies storage in the way that objects do. — end note [...]
Next is deduction (of mine): A function (note: not differentiated if free or member one) is not an object, thus cannot be a subobject, thus is not part of the data of an object.
Further (same §):
An object has a type (6.9). Some objects are polymorphic (13.3); the implementation generates information associated with each such object that makes it possible to determine that object’s type during program execution.
(That is the vtables! - note that it is not explicit about how they are implemented, does not even enforce them at all, if a compiler vendor finds some alternative, it is free to use it...).
For other objects, the interpretation of the values found therein is determined by the type of the expressions (Clause 8) used to access them.
Well, couldn't find any hints (so far), if or how functions influence the layout of a class, not flying over the standard as a whole, not (with special attendance) in chapters 8 (as referenced above) or 12 "Classes" (especially 12.2 "Class members").
Seems as this is not explicitly specified (won't hold my hand into fire for not having overseen, though...). Maybe it is valid to deduce this already from functions not being objects solely...
Standard layout classes, as referenced by Jan Husec, provide further guarantees on layout, such as no reordering of members (which is allowed for members of different accessibility), alignment conditions, ...
From those conditions for being a SLC, I deduce that for these, at least, the guarantee applies, as all that is referenced for being layout compatible is the data members, no mention of (non-static member) functions (other than no virtual ones being allowed...).
As explained by other answers the layout can change in case a virtual destructor is added.
If your destructor is not virtual you can go ahead and add. But say you need a virtual destructor, you can wrap the struct in another struct and place the destructor there.
Make sure the fields that you want to modify are accessible to the wrapper class. A friend relation should be good enough.
Although all your inheritance will be through the wrapper. The inner struct is just to maintain the layout so you can send it over the network.
suppose I have three class like these:
class base {
//some data
method();
};
class sub1 : base {
//some data
//overrides base method
method();
};
class sub2: base {
//some data
//overrides base methods
method();
};
How can I create a array mixed with sub1 and sub2? then calling subclass method with base?
Ok, let's sort this out. First of all, you probably meant virtual method();, probably with a return type, maybe with parameters. Without virtual, base class pointers and references won't know about the overridden method. Second, make the destructor virtual. Do this until you know why you need to (delete (base*) new derived;) - then keep doing this until all your neighbourhood knows why you need to. Third, the sad thing is, all std. C++ containers are homogeneous (non-std. heterogeneous container-like objects in Boost exist), thus you need to find an object that's common and that's somehow able to handle these types. Common choices are:
Common base class pointer, in your case, base*. This conventionally owns the objects and is manually (de)allocated (that is, you need to call new and delete). This is the most common choice. You might try smart pointers later, but let's get the basics first.
Common base class reference, in your case, base&. Common convention is that this doesn't own the object (albeit this is not a language restriction), thus it's mainly used for referring to objects that are stored in another container. Since you need to store them somewhere, I wouldn't opt for this now, but it might come handy later.
std::variant<> (or boost::variant<>), this is a discriminated union, that is, a class that stores one and only one of the listed items and knows which one it stores. You don't need a common base class, but even if you have one, it's cool because it tends to store objects locally, thus might be faster when you have enough cache.
union, which is like variant, but does not know the type being stored. Local storage is guaranteed, as well as UB if you write one field and read another
Compiler-specific solutions. If you know that your classes are of the same size (in this case, they are) and you know for sure that you have untyped memory, then you might store the base class and it'll 'just work', provided you always take the address and -> operator. Note that this is UB squared, I just list this because you'll likely encounter similar code. Also note that simply having a union does not remove UB in this case - until we have access to virtual table pointer, this can only be done by manually handling virtual functions.
Classes with non-virtual destructors are a source for bugs if they are used as a base class (if a pointer or reference to the base class is used to refer to an instance of a child class).
With the C++11 addition of a final class, I am wondering if it makes sense to set down the following rule:
Every class must fulfil one of these two properties:
be marked final (if it is not (yet) intended to be inherited from)
have a virtual destructor (if it is (or is intended to) be inherited from)
Probably there are cases were neither of these two options makes sense, but I guess they could be treated as exceptions that should be carefully documented.
The probably most common actual issue attributed to the lack of a virtual destructor is deletion of an object through a pointer to a base class:
struct Base { ~Base(); };
struct Derived : Base { ~Derived(); };
Base* b = new Derived();
delete b; // Undefined Behaviour
A virtual destructor also affects the selection of a deallocation function. The existence of a vtable also influences type_id and dynamic_cast.
If your class isn't use in those ways, there's no need for a virtual destructor. Note that this usage is not a property of a type, neither of type Base nor of type Derived. Inheritance makes such an error possible, while only using an implicit conversion. (With explicit conversions such as reinterpret_cast, similar problems are possible without inheritance.)
By using smart pointers, you can prevent this particular problem in many cases: unique_ptr-like types can restrict conversions to a base class for base classes with a virtual destructor (*). shared_ptr-like types can store a deleter suitable for deleting a shared_ptr<A> that points to a B even without virtual destructors.
(*) Although the current specification of std::unique_ptr doesn't contain such a check for the converting constructor template, it was restrained in an earlier draft, see LWG 854. Proposal N3974 introduces the checked_delete deleter, which also requires a virtual dtor for derived-to-base conversions. Basically, the idea is that you prevent conversions such as:
unique_checked_ptr<Base> p(new Derived); // error
unique_checked_ptr<Derived> d(new Derived); // fine
unique_checked_ptr<Base> b( std::move(d) ); // error
As N3974 suggests, this is a simple library extension; you can write your own version of checked_delete and combine it with std::unique_ptr.
Both suggestions in the OP can have performance drawbacks:
Mark a class as final
This prevents the Empty Base Optimization. If you have an empty class, its size must still be >= 1 byte. As a data member, it therefore occupies space. However, as a base class, it is allowed not to occupy a distinct region of memory of objects of the derived type. This is used e.g. to store allocators in StdLib containers.
C++20 has mitigated this with the introduction of [[no_unique_address]].
Have a virtual destructor
If the class doesn't already have a vtable, this introduces a vtable per class plus a vptr per object (if the compiler cannot eliminate it entirely). Destruction of objects can become more expensive, which can have an impact e.g. because it's no longer trivially destructible. Additionally, this prevents certain operations and restricts what can be done with that type: The lifetime of an object and its properties are linked to certain properties of the type such as trivially destructible.
final prevents extensions of a class via inheritance. While inheritance is typically one of the worst ways to extend an existing type (compared to free functions and aggregation), there are cases where inheritance is the most adequate solution. final restricts what can be done with the type; there should be a very compelling and fundamental reason why I should do that. One cannot typically imagine the ways others want to use your type.
T.C. points out an example from the StdLib: deriving from std::true_type and similarly, deriving from std::integral_constant (e.g. the placeholders). In metaprogramming, we're typically not concerned with polymorphism and dynamic storage duration. Public inheritance often just the simplest way to implement metafunctions. I do not know of any case where objects of metafunction type are allocated dynamically. If those objects are created at all, it's typically for tag dispatching, where you'd use temporaries.
As an alternative, I'd suggest using a static analyser tool. Whenever you derive publicly from a class without a virtual destructor, you could raise a warning of some sort. Note that there are various cases where you'd still want to derive publicly from some base class without a virtual destructor; e.g. DRY or simply separation of concerns. In those cases, the static analyser can typically be adjusted via comments or pragmas to ignore this occurrence of deriving from a class w/o virtual dtor. Of course, there need to be exceptions for external libraries such as the C++ Standard Library.
Even better, but more complicated is analysing when an object of class A w/o virtual dtor is deleted, where class B inherits from class A (the actual source of UB). This check is probably not reliable, though: The deletion can happen in a Translation Unit different to the TU where B is defined (to derive from A). They can even be in separate libraries.
The question that I usually ask myself, is whether an instance of the class may be deleted via its interface. If this is the case, I make it public and virtual. If this is not the case, I make it protected. A class only needs a virtual destructor if the destructor will be invoked through its interface polymorphically.
Well, to be strictly clear, it's only if the pointer is deleted or the object is destructed (through the base class pointer only) that the UB is invoked.
There could be some exceptions for cases where the API user cannot delete the object, but other than that, it's generally a wise rule to follow.
(I've edited this question to avoid distractions. There is one core question which would need to be cleared up before any other question would make sense. Apologies to anybody whose answer now seems less relevant.)
Let's set up a specific example:
struct Base {
int i;
};
There are no virtual method, and there is no inheritance, and is generally a very dumb and simple object. Hence it's Plain Old Data (POD) and it falls back on a predictable layout. In particular:
Base b;
&b == reinterpret_cast<B*>&(b.i);
This is according to Wikipedia (which itself claims to reference the C++03 standard):
A pointer to a POD-struct object, suitably converted using a reinterpret cast, points to its initial member and vice versa, implying that there is no padding at the beginning of a POD-struct.[8]
Now let's consider inheritance:
struct Derived : public Base {
};
Again, there are no virtual methods, no virtual inheritance, and no multiple inheritance. Therefore this is POD also.
Question: Does this fact (Derived is POD in C++11) allow us to say that:
Derived d;
&d == reinterpret_cast<D*>&(d.i); // true on g++-4.6
If this is true, then the following would be well-defined:
Base *b = reinterpret_cast<Base*>(malloc(sizeof(Derived)));
free(b); // It will be freeing the same address, so this is OK
I'm not asking about new and delete here - it's easier to consider malloc and free. I'm just curious about the regulations about the layout of derived objects in simple cases like this, and where the initial non-static member of the base class is in a predictable location.
Is a Derived object supposed to be equivalent to:
struct Derived { // no inheritance
Base b; // it just contains it instead
};
with no padding beforehand?
You don't care about POD-ness, you care about standard-layout. Here's the definition, from the standard section 9 [class]:
A standard-layout class is a class that:
has no non-static data members of type non-standard-layout class (or array of such types) or reference,
has no virtual functions (10.3) and no virtual base classes (10.1),
has the same access control (Clause 11) for all non-static data members,
has no non-standard-layout base classes,
either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
has no base classes of the same type as the first non-static data member.
And the property you want is then guaranteed (section 9.2 [class.mem]):
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.
This is actually better than the old requirement, because the ability to reinterpret_cast isn't lost by adding non-trivial constructors and/or destructor.
Now let's move to your second question. The answer is not what you were hoping for.
Base *b = new Derived;
delete b;
is undefined behavior unless Base has a virtual destructor. See section 5.3.5 ([expr.delete])
In the first alternative (delete object), if the static type of the object to be deleted is different from its dynamic type, the static type shall be a base class of the dynamic type of the object to be deleted and the static type shall have a virtual destructor or the behavior is undefined.
Your earlier snippet using malloc and free is mostly correct. This will work:
Base *b = new (malloc(sizeof(Derived))) Derived;
free(b);
because the value of pointer b is the same as the address returned from placement new, which is in turn the same address returned from malloc.
Presumably your last bit of code is intended to say:
Base *b = new Derived;
delete b; // delete b, not d.
In that case, the short answer is that it remains undefined behavior. The fact that the class or struct in question is POD, standard layout or trivially copyable doesn't really change anything.
Yes, you're passing the right address, and yes, you and I know that in this case the dtor is pretty much a nop -- nonetheless, the pointer you're passing to delete has a different static type than dynamic type, and the static type does not have a virtual dtor. The standard is quite clear that this gives undefined behavior.
From a practical viewpoint, you can probably get away with the UB if you really insist -- chances are pretty good that there won't be any harmful side effects from what you're doing, at least with most typical compilers. Beware, however, that even at best the code is extremely fragile so seemingly trivial changes could break everything -- and even switching to a compiler with really heavy type checking and such could do so as well.
As far as your argument goes, the situation's pretty simple: it basically means the committee probably could make this defined behavior if they wanted to. As far as I know, however, it's never been proposed, and even if it had it would probably be a very low priority item -- it doesn't really add much, enable new styles of programming, etc.
This is meant as a supplement to Ben Voigt's answer', not a replacement.
You might think that this is all just a technicality. That the standard calling it 'undefined' is just a bit of semantic twaddle that has no real-world effects beyond allowing compiler writers to do silly things for no good reason. But this is not the case.
I could see desirable implementations in which:
Base *b = new Derived;
delete b;
Resulted in behavior that was quite bizarre. This is because storing the size of your allocated chunk of memory when it is known statically by the compiler is kind of silly. For example:
struct Base {
};
struct Derived {
int an_int;
};
In this case, when delete Base is called, the compiler has every reason (because of the rule you quoted at the beginning of your question) to believe that the size of the data pointed at is 1, not 4. If it, for example, implements a version of operator new that has a separate array in which 1 byte entities are all densely packed, and a different array in which 4 byte entities are all densely packed, it will end up assuming the Base * points to somewhere in the 1-byte entity array when in fact it points somewhere in the 4-byte entity array, and making all kinds of interesting errors for this reason.
I really wish operator delete had been defined to also take a size, and the compiler passed in either the statically known size if operator delete was called on an object with a non-virtual destructor, or the known size of the actual object being pointed at if it were being called as a result of a virtual destructor. Though this would likely have other ill effects and maybe isn't such a good idea (like if there are cases in which operator delete is called without a destructor having been called). But it would make the problem painfully obvious.
There is lots of discussion on irrelevant issues above. Yes, mainly for C compatibility there are a number of guarantees you can rely as long as you know what you are doing. All this is, however, irrelevant to your main question. The main question is: Is there any situation where an object can be deleted using a pointer type which doesn't match the dynamic type of the object and where the pointed to type doesn't have a virtual destructor. The answer is: no, there is not.
The logic for this can be derived from what the run-time system is supposed to do: it gets a pointer to an object and is asked to delete it. It would need to store information on how to call derived class destructors or about the amount of memory the object actually takes if this were to be defined. However, this would imply a possibly quite substantial cost in terms of used memory. For example, if the first member requires very strict alignment, e.g. to be aligned at an 8 byte boundary as is the case for double, adding a size would add an overhead of at least 8 bytes to allocate memory. Even though this might not sound too bad, it may mean that only one object instead of two or four fits into a cache line, reducing performance substantially.
How does the conversion between derived and base class internally occurs and how does compiler knows or does it store the size of object?
For example in the following:
class A
{
public:
A():x(2){};
private:
int x;
};
class B : public A
{
public:
B():A(),y(5){};
private:
int y;
};
class C : public B
{
public:
C():B(),z(9){};
private:
int z;
};
int main()
{
C *CObj = new C;
B *pB = static_cast<B*>(CObj);
delete CObj;
}
Edit: It must have been this:
B BObj = static_cast<B>(*CObj);
You don't have any "derived to base" conversion in your code. What you have in your code is a pointer-to-derived to pointer-to-base conversion. (This conversion does not require any explicit cast, BTW)
B *pB = CObj; // no need for the cast
In order to perform the pointer conversion, there's no need to know the size of the object. So, it is not clear where your reference to "size of the object" comes from.
In fact, in the typical implementation the above conversion for single-inheritance hierarchy of non-polymorphic classes is purely conceptual. I.e. the compiler does not do anything besides simply copying the numerical value of the derived pointer into the base pointer. No extra information is needed to perform this operation. No size, no nothing.
In more complicated situations (like multiple inheritance), the compiler might indeed have to generate code that would adjust the value of the pointer. And it will indeed need to know the sizes of the objects involved. But the sizes involved are always compile-time ones, i.e. they are compile-time constants, meaning that the compiler does immediately know them.
In even more complicated cases, like virtual inheritance, this conversion is normally supported by run-time structures implicitly built into the object, which will include everything deemed necessary. Run-time size of the object might be included as well, if the implementation chooses to do so.
Note that you don't need the static_cast here; it's perfectly legal to "up-cast" a pointer-to-derived-class to a pointer-to-parent-class.
In this example, there is no conversion going on. The pointer value stays the same (i.e. under the hood, CObj and pB point at the same memory, though things get more complex with multiple inheritance). The compiler organises the members of B and C objects in memory so that everything just works. As we're dealing with pointers, the size of the object doesn't matter (that was only relevant when you created a new C).
If you had any virtual methods, then we could talk about vtables and vptrs (http://en.wikipedia.org/wiki/Vtable).
A derived class object has base class subobjects. Specifically the Standard says in 10.3
"The order in which the base class
subobjects are allocated in the most
derived object (1.8) is unspecified"
This means that even though many a times, the base subobject could be right at the beginning of the derived object, it is not necessary. Hence the conversion from Derived* to Base* is completely unspecified and is probably left as a degree of latitude to compiler developers.
I would say that it is important to know the rules of the language and the reason behind the same, rather than worry about how compiler implements them. As an example, I have seen far too many discussions on VTABLE and VPTR which is a compiler specific implementation to achieve dynamic binding. Instead it helps to know about the concept of 'unique final overrider' that is enough to understand the concept of virtual functions and dynamic binding. The point is to focus on 'what' rather than 'how', because 'how' most of the times is not required. I say most of the times because in some cases it helps. An example is to understand the concept of 'pointer to members'. It helps to know that it is usually implemented in some form of 'offset' rather than being a regular pointer.
How does the conversion between derived and base class internally occurs
Implementation defined.
Imposable to answer unless you tell us which compiler you are using.
But generally not worth knowing or worrying about (unless you are writing a compiler).
and how does compiler knows [editor] size of the object
The compiler knows the size (It has worked out the size of C during compilation).
or does it store the size of object?
The object does not need to know the size and thus it is not stored as part of the class.
The runtime memory management (used via new) may need to know (but it is implementation defined) so that it can correctly release the memory (but anything it stores will not be stroed in the object).
If you have ever done any C, the answer would come from itself.
A memory allocator doesn't care at all about what it is storing. It just have to know what memory ranges has been allocated. It doesn't see the difference between a C and an int[4]. It just have to know how to free the memory range that starts at the given pointer.