will the padding of base class be copied into the derived class? - c++

Recently, I've been reading "inside the c++ object model". It says that the padding used in base class should also be copied into the derived class, in case you want to assign the base class to the derived class. Thus, I run a test under a 64-bit computer:
class A {
public:
int valA;
char a;
};
class B : public A {
public:
char b;
};
class C : public B {
public:
char c;
};
int main(){
std::cout << sizeof(A) << " " << sizeof(B) << " " << sizeof(C)
<< std::endl;
C c;
printf("%p\n%p\n%p\n",&c,&c.b,&c.c);
}
Here is the result:
8 12 12
0x7ffd22c5072c
0x7ffd22c50734
0x7ffd22c50735
So why is C the same size of B? Although it seems that B used the 3 byte padding in A.

So why is C the same size of B?
Because trailing padding of B was reused for C::b. The padding can be reused because B is not a POD (plain old data) class (because it is not a standard layout class).
Although it seems that B used the 3 byte padding in A.
The padding of A cannot be reused for other sub objects of B, because A is a standard layout class and is trivially copyable i.e. A is a POD class.
will the padding of base class be copied into the derived class?
I suppose that you didn't mean to ask about copying, but are rather whether the base class sub object of a derived class will have the same padding as the individual type.
The answer is, as might be deduced from the above: The padding will be same, except the trailing padding may be re-used for other sub objects, unless the base class is a POD, in which case its padding can not be reused.
In the case where the padding may be reused, whether it will be is not specified by the standard, and there are differences between compilers.
Please explain or link to a definition of "standard layout types".
Current standard draft:
[basic.types]
... Scalar types, standard-layout class types ([class.prop]), arrays of such types and cv-qualified versions of these types are collectively called standard-layout types.
[class.prop] (in older versions of the standard, these may be found under [class] directly)
A class S is a standard-layout class if it:
(3.1) has no non-static data members of type non-standard-layout class (or array of such types) or reference,
(3.2) has no virtual functions and no virtual base classes,
(3.3) has the same access control for all non-static data members,
(3.4) has no non-standard-layout base classes,
(3.5) has at most one base class subobject of any given type,
(3.6) has all non-static data members and bit-fields in the class and its base classes first declared in the same class, and
(3.7) has no element of the set M(S) of types as a base class, where for any type X, M(X) is defined as follows.107 [ Note: M(X) is the
set of the types of all non-base-class subobjects that may be at a
zero offset in X. — end note]
(3.7.1) If X is a non-union class type with no (possibly inherited) non-static data members, the set M(X) is empty.
(3.7.2) If X is a non-union class type with a non-static data member of type X0 that is either of zero size or is the first
non-static data member of X (where said member may be an anonymous
union), the set M(X) consists of X0 and the elements of M(X0).
(3.7.3) If X is a union type, the set M(X) is the union of all M(Ui) and the set containing all Ui, where each Ui is the type of the
ith non-static data member of X.
(3.7.4) If X is an array type with element type Xe, the set M(X) consists of Xe and the elements of M(Xe).
(3.7.5) If X is a non-class, non-array type, the set M(X) is empty.
Item (3.6) applies in this case. Some members of B are not first declared in B. In particular, B::A::valA, and B::A::a are declared first in A. A friendlier way to describe the rule is: The class must have either no direct members, or none of its ancestors must have members. In this case both the base and the derived class have members, so it is not standard layout.

C is the same size as B because on your platform the ABI chooses the use the padding in B to store the 1-byte member C::c. B has 3 bytes of padding at the end because the entire B object has alignment 4 (due to the int member in A).
B is not the same size as A, however, because in this case the ABI apparently does not allow storing B::b in the padding of A even though there is room. This happens when all of A members are public, as they are in your example: if you make any member private, the size of A, B and C will all be 8. I believe this may be for ABI backwards compatibility, rather than motivated by any language in the standard.
I don't know if there is language in the standard which directly allows this (but there doesn't need to be), but it certainly seems that this type of padding re-use is contemplated in the case of inheritance. For example, the documentation for std::memcpy says:
If the objects are potentially-overlapping or not TriviallyCopyable, the behavior of memcpy is not specified and may be undefined.
It goes on to define potentially-overlapping:
A subobject is potentially overlapping if it is either
a base class subobject, or
a non-static data member declared with the [[no_unique_address]] attribute.
The second condition applies only in C++20.
This seems to be written to allow padding to be shared: if this clause didn't exist, memcpy on a pointer to a B subclass of C would overwrite the value of C::c which is stored in what is usually padding for B.

Related

Confused about std::is_standard_layout type trait

Standard-layout class describes a standard layout class that:
has no non-static data members of type non-standard-layout class (or array of such types) or reference,
has no virtual functions and no virtual base classes,
has the same access control for all non-static data members,
has no non-standard-layout base classes,
has all non-static data members and bit-fields in the class and its base classes first declared in the same class, and
given the class as S, has no element of the set M(S) of types as a base class, where M(X) for a type X is defined as:
If X is a non-union class type with no (possibly inherited) non-static data members, the set M(X) is empty.
If X is a non-union class type whose first non-static data member has type X0 (where said member may be an anonymous union), the set M(X) consists of X0 and the elements of M(X0).
If X is a union type, the set M(X) is the union of all M(Ui) and the set containing all Ui, where each Ui is the type of the ith non-static data member of X.
If X is an array type with element type Xe, the set M(X) consists of Xe and the elements of M(Xe).
If X is a non-class, non-array type, the set M(X) is empty.
(numbered for convivence)
However, the following fails:
#include <type_traits>
struct A {
int a;
};
struct B : A {
int b;
};
static_assert(std::is_standard_layout<A>::value, "not standard layout"); // succeeds
static_assert(std::is_standard_layout<B>::value, "not standard layout"); // fails
Demo
I see that 1-4 are true, so is one of points under 5 false? I find the points under 5 a little confusing.
The cppreference description is a little confusing.
The standard (search "standard-layout class is") says:
A standard-layout class is a class that:
has no non-static data members of type non-standard-layout class (or array of such types) or reference,
has no virtual functions (10.3) and no virtual base classes (10.1),
has the same access control (Clause 11) for all non-static data members,
has no non-standard-layout base classes,
either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
has no base classes of the same type as the first non-static data member.
You can see from the part I emphasized that a class with a non-static data member and a base class that itself has a non-static data member is not standard-layout.
I suspect this point is described as follows in cppreference:
has all non-static data members and bit-fields in the class and its base classes first declared in the same class
The point is that you must be able to cast the address of a standard-layout class object to a pointer to its first member, and back. It boils down to compatibility between C++ and C.
See also: Why is C++11's POD "standard layout" definition the way it is?.

C++ Union, two active members only differing by CV-qualification

If I have a union with two data members of the same type, differing only by CV-qualification:
template<typename T>
union A
{
private:
T x_priv;
public:
const T x_publ;
public:
// Accept-all constructor
template<typename... Args>
A(Args&&... args) : x_priv(args...) {}
// Destructor
~A() { x_priv.~T(); }
};
And I have a function f that declares a union A, thus making x_priv the active member and then reads x_publ from that union:
int f()
{
A<int> a {7};
return a.x_publ;
}
In every compiler I tested there were no errors compiling nor at runtime for both int types and other, more complex, types such as std::string and std::thread.
I went to see on the standard if this was legal behavior and I started on looking at the difference of T and const T:
6.7.3.1 [basic.type.qualifier]
The cv-qualified or cv-unqualified versions of a type are distinct types; however, they shall have the same representation and alignment requirements ([basic.align]).
This means that when declaring a const T it has the exact same representation in memory as a T. But then I found that the standard actually disallows this for some types, which I found weird, as I see no reason for it.
I started my search on accessing non-active members.
It is only legal to access the common initial sequence of T and const T if both are standard-layout types.
10.4.1[class.union]
At most one of the non-static data members of an object of union type can be active at any time [...] [ Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence ([class.mem]), and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members; see [class.mem]. — end note ]
The initial sequence is basically the order of the non-static data members with a few exceptions, but since T and const T have the exact same members in the same layout, this means that the common initial sequence of T and const T is all of the members of T.
10.3.22 [class.mem]
The common initial sequence of two standard-layout struct ([class.prop]) types is the longest sequence of non-static data members and bit-fields in declaration order, starting with the first such entity in each of the structs, such that corresponding entities have layout-compatible types, either both entities are declared with the no_­unique_­address attribute ([dcl.attr.nouniqueaddr]) or neither is, and either both entities are bit-fields with the same width or neither is a bit-field. [ Example:
And here is where the restrictions come in, it restricts some types from being accessed, even though they have the exact same representation in memory:
10.1.3 [class.prop]
A class S is a standard-layout class if it:
(3.1) has no non-static data members of type non-standard-layout class (or array of such types) or reference,
(3.2) has no virtual functions and no virtual base classes,
(3.3) has the same access control for all non-static data members,
(3.4) has no non-standard-layout base classes,
(3.5) has at most one base class subobject of any given type,
(3.6) has all non-static data members and bit-fields in the class and its base classes first declared in the same class, and
(3.7) has no element of the set M(S) of types as a base class, where for any type X, M(X) is defined as follows.108 [ Note: M(X) is the set of the types of all non-base-class subobjects that may be at a zero offset in X. — end note ]
(3.7.1) If X is a non-union class type with no (possibly inherited) non-static data members, the set M(X) is empty.
(3.7.2) If X is a non-union class type with a non-static data member of type X_0 that is either of zero size or is the first non-static data member of X (where said member may be an anonymous union), the set M(X) consists of X_0 and the elements of M(X_0).
(3.7.3) If X is a union type, the set M(X) is the union of all M(U_i) and the set containing all U_i, where each U_i is the type of the ith non-static data member of X.
(3.7.4) If X is an array type with element type X_e , the set M(X) consists of X e and the elements of M (X_e).
(3.7.5) If X is a non-class, non-array type, the set M(X) is empty.
My questions is is there any reason for this to not be valid behavior?.
Essentially is it that:
The standard makers forgot to account for this particular case?
I haven't read some part of the standard that allows this behavior?
There's some more specific reason for this not to be valid behavior?
A reason for this to be valid syntax is, for example, having a 'readonly' variable in a class, as such:
struct B;
struct A
{
... // Everything that struct A had before
friend B;
}
struct B
{
A member;
void f() { member.x_priv = 100; }
}
int main()
{
B b;
b.f(); // Modifies the value of member.x_priv
//b.member.x_priv = 100; // Invalid, x_priv is private
int x = b.member.x_publ; // Fine, x_publ is public
}
This way you don't need a getter function, which can cause performance overhead and although most compiler would optimize that away it still increases your class, and to get the variable you'd have to write int x = b.get_x().
Nor would you need a const reference to that variable (as described in this question), which while it works great, it adds size to your class, which can be bad for sufficiently big classes or classes that need to be as small as possible.
And it is weird having to write b.member.x_priv instead of b.x_priv but this would be fixable if we could have private members in anonymous unions then we could rewrite it like this:
struct B
{
union
{
private:
int x_priv;
public:
int x_publ;
friend B;
};
void f() { x_priv = 100; }
}
int main()
{
B b;
b.f(); // Modifies the value of member.x_priv
//b.x_priv = 100; // Invalid, x_priv is private
int x = b.x_publ; // Fine, x_publ is public
}
Another use case might be to give various names to the same data member, lie for example in a Shape, the user might want to refer to the position as either shape.pos, shape.position, shape.cur_pos or shape.shape_pos.
Although this would probably create more problems than it is worth, such a use case might be favorable when for example a name should be deprecated .
Code like this:
struct A { int i; };
struct B { int j; };
union U {
struct A a;
struct B b;
};
int main() {
union U u;
u.a.i = 1;
printf("%d\n", u.b.j);
}
is valid in C. For the sake of backward compatibility, it was considered desirable to ensure that it is also valid in C++. The special rules about common initial sequences of standard-layout structs ensure this backward compatibility. Extending the rule to allow more cases to be well-defined—ones involving non-standard-layout structs—is not necessary for C compatibility, since all structs that can be defined in the common subset of C and C++ are automatically standard-layout structs in C++.
Actually, the C++ rules are a little bit more permissive than required for C compatibility. They allow some cases involving base classes too:
struct A { int i; };
struct B { int j; };
struct C : A { };
struct D : B { };
// C and D have a common initial sequence consisting of C::i and D::j
But in general, structs in C++ can be much more complicated than their C counterparts. They can, for example, have virtual functions and virtual base classes, and those can affect their layout in an implementation-defined manner. For this reason, it's not so easy to make more cases of type punning through unions well-defined in C++. You would really have to sit down with implementers and discuss what the conditions would be such that the committee should mandate that two classes have the same layout for their common initial sequence and not leave it up to the implementation. Currently, that mandate applies only to standard-layout classes.
There are various rules in the standard that are strong enough to imply that T and const T always have the exact same layout even if T is not a standard-layout class. For this reason, it would be possible to make certain forms of type punning between a T member and a const T member of a union well-defined even if T is not standard-layout. However, adding only this very special case to the language is of dubious value and I think it's unlikely that the committee would accept such a proposal unless you have a really compelling use case. Not wanting to provide a getter that returns a const reference, simply because you don't want to write the () to call the getter each time you need access, is unlikely to convince the committee.

What is the purpose of bullet point (7.5) in [class]/7, in C++14?

This is basically a continuation of my prior question.
This is [class]/7 in C++14:
A standard-layout class is a class that:
(7.1) — has no non-static data members of type non-standard-layout class (or array of such types) or reference,
(7.2) — has no virtual functions (10.3) and no virtual base classes (10.1),
(7.3) — has the same access control (Clause 11) for all non-static data members,
(7.4) — has no non-standard-layout base classes,
(7.5) — either has no non-static data members in the most derived class and at most one base class with
non-static data members, or has no base classes with non-static data members, and
(7.6) — has no base classes of the same type as the first non-static data member.
Consider the following snippet:
struct B{ int i; };
struct A : B{ int j; };
A satisfies bullet points (7.1) thru (7.4), but doesn't satisfy (7.5), as A has a non-static data member and has a base class with a non-static data member.
What is the problem with A being a standard-layout class?
Edit
As far as I can understand the accepted answer to the question of which this is being considered a dupe, the snippet above would have undefined behavior, if I tried to cast a pointer to A to the first data member of the base class B and back, because of this sentence written by the OP:
Within a class, members are allocated in increasing addresses according to the declaration order. However C++ doesn't dictate the order of allocation for data members across classes.
But that doesn't seem to answer my question. Suppose for example that in a certain compiler implementation, base B would follow struct A in memory, instead of preceding it. But this would contradict the fact that there is an implicit conversion, from a pointer to a derived class, to a pointer to a base class, according to [conv.ptr]/3:
A prvalue of type “pointer to cv D”, where D is a class type, can be
converted to a prvalue of type “pointer to cv B”, where B is a base
class (Clause 10) of D.
That is, if the base B followed struct A in memory, the above implicit conversion would be invalid.
Directly answering the question as phrased:
The purpose of this bullet is to allow very simple cases of inheritance where only one of the classes has data members.
Data layout for inheritance is unspecified, so the standard could just disallow inheritance altogether, but the standard makes an exception if one class has no data to treat the result still as Standard Layout.

struct alignment C/C++

In c/c++ (I am assuming they are the same in this regard), if I have the following:
struct S {
T a;
.
.
.
} s;
Is the following guaranteed to be true?
(void*)&s == (void*)&s.a;
Or in other words, is there any kind of guarantee that there will be no padding before the first member?
In C, yes, they're the same address. Simple, and straightforward.
In C++, no, they're not the same address. Base classes can (and I would suspect, do) come before all members, and virtual member functions usually add hidden data to the struct somewhere. Even more confusing, a C++ compiler may also rearrange members at will, unless the class is a standard layout type (though I don't know that any compiler does so)
Finally, if the C++ struct is composed of standard layout types, contains no base classes nor virtual functions and all members have the same visibility, and possibly other limitations I forgot, then it falls back on the C rules, and requires the first member to be at the same address as the object itself.
§ 9.2/7
A standard-layout class is a class that:
— has no non-static data members of type non-standard-layout class (or array of such types) or reference,
— has no virtual functions (10.3) and no virtual base classes (10.1),
— has the same access control (Clause 11) for all non-static data members,
— has no non-standard-layout base classes,
— either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
— has no base classes of the same type as the first non-static data member.
§ 9.2/20
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ]
Yes, it is.
It is guaranteed there is no padding before the first struct member in C and in C++ (if it is a POD).
C quote:
(C11, 6.7.2.1p15) "There may be unnamed padding within a structure object, but not at its beginning."
C++ quote:
(C++11, 9.2p20) "There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment"

C++ optimise away private variable

Does ISO C++ (11) permit a private non-static class member variable to be optimised away?
This could be detected:
class X { int x; };
assert (sizeof(X) >= sizeof(int));
but I am not aware of a clause that demands the assertion above.
To clarify: (a) Is there a clause in the C++ Standard that ensure the assertion above.
(b) Can anyone think of any other way to detect the elision of x?
[offsetof?]
(c) Is the optimisation permitted anyhow, despite (a) and (b)?
I have a feeling the optimisation could be possible if the class is local to a function but not otherwise (but I'd like to have a definitive citation).
I do not think it is forbidden, but I think it is impractical.
§9 Classes [class]
7/ A standard-layout class is a class that:
has no non-static data members of type non-standard-layout class (or array of such types) or reference,
has no virtual functions (10.3) and no virtual base classes (10.1),
has the same access control (Clause 11) for all non-static data members,
has no non-standard-layout base classes,
either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
has no base classes of the same type as the first non-static data member.107
8/ A standard-layout struct is a standard-layout class defined with the class-key struct or the class-key class.
... thus class X { int x; }; is a standard-layout struct.
§9.2 Class members [class.mem]
16/ Two standard-layout struct (Clause 9) types are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in declaration order) have layout-compatible types (3.9).
... thus class X { int x; }; is layout-compatible with struct Y { int y; };.
The unfortunate thing is that layout-compatible is not formally defined in the Standard. However given the use of the word layout it seems the intent is to declare that two layout-compatible types should have the same underlying representation.
Therefore, to be able to remove the x in X one would have to prove that all structures that are layout-compatible (such as Y) are amenable to the same optimization (to keep the layout compatibility). It seems quite... improbable... in any non-trivial program.