Order of storage inside a structure / object

Order of storage inside a structure / object - c++

Consider these two cases :
struct customType
{
dataType1 var1;
dataType2 var2;
dataType3 var3;
} ;
customType instance1;
// Assume var1, var2 and var3 were initialized to some valid values.
customType * instance2 = &instance1;
dataType1 firstMemberInsideStruct = (dataType1)(*instance2);
class CustomType
{
public:
dataType1 member1;
dataType2 member2;
retrunType1 memberFunction1();
private:
dataType3 member3;
dataType4 member4;
retrunType2 memberFunction2();
};
customType object;
// Assume member1, member2, member3 and member4 were initialized to some valid values.
customType *pointerToAnObject = &object ;
dataType1 firstMemberInTheObject = (dataType1) (*pointerToAnObject);
Is it always safe to do this ?
I want to know if standard specifies any order of storage among -
The elements inside a C structure.
Data members inside an object of a C++ class.

C99 and C++ differ a bit on this.
The C99 standard guarantees that the fields of a struct will be laid out in memory in the order they are declared, and that the fields of two identical structs will have the same offsets. See this question for the relevant sections of the C99 standard. To summarize: the offset of the first field is specified to be zero, but the offsets after that are not specified by the standard. This is to allow C compilers to adjust the offsets of each field so the field will satisfy any memory alignment requirements of the architecture. Because this is implementation-dependent, C provides a standard way to determine the offset of each field using the offsetof macro.
C++ offers this guarantee only for Plain old data (POD). C++ classes that are not plain old data cannot be treated like this. The standard gives the C++ compiler quite a bit of freedom in how it organizes a class when the class uses multiple inheritance, has non-public fields or members, or contains virtual members.
What this means for your examples:
dataType1 firstMemberInsideStruct = (dataType1)(*instance2);
This line is okay only if dataType1, dataType2, and dataType3 are plain old data. If any of them are not, then the customType struct may not have a trivial constructor (or destructor) and this assumption may not hold.
dataType1 firstMemberInTheObject = (dataType1) (*pointerToAnObject);
This line is not safe regardless of whether dataType1, dataType2, and dataType3 are POD, because the CustomType class has private instance variables. This makes it not a POD class, and so you cannot assume that its first instance variable will be ordered in a particular way.

9.0.7
A standard-layout class is a class that: — has no non-static data
members of type non-standard-layout class (or array of such types) or
reference, — has no virtual functions (10.3) and no virtual base
classes (10.1), — has the same access control (Clause 11) for all
non-static data members, — has no non-standard-layout base classes, —
either has no non-static data members in the most derived class and at
most one base class with non-static data members, or has no base
classes with non-static data members, and — has no base classes of the
same type as the first non-static data member.108
9.2.14
Nonstatic data members of a (non-union) class with the same access
control (Clause 11) are allocated so that later members have higher
addresses within a class object. The order of allocation of non-static
data members with different access control is unspecified (11).
Implementation alignment requirements might cause two adjacent members
not to be allocated immediately after each other; so might
requirements for space for managing virtual functions (10.3) and
virtual base classes (10.1).
9.2.20
A pointer to a standard-layout struct object, suitably converted using
a reinterpret_cast, points to its initial member (or if that member is
a bit-field, then to the unit in which it resides) and vice versa. [
Note: There might therefore be unnamed padding within a
standard-layout struct object, but not at its beginning, as necessary
to achieve appropriate alignment. — end note ]

It's not always safe to do so. If the classes have virtual methods, it most definitely is not. Data members are guaranteed to appear in the same order for the same access level chunk, but these groups can be reordered.
In order to be safe with these type of casts, you should provide a conversion constructor or a cast operator, and not rely on implementation details.

Typically in a C struct members are stored in the order that they are declared. However the elements must be aligned properly. Wikipedia has a good example of how this works.
I will re-iterate here:
If you have the following struct
struct MixedData
{
char Data1;
short Data2;
int Data3;
char Data4;
};
padding will be inserted in between differing data types in order to assure the proper byte-alignment. chars are 1-byte aligned, shorts are 2-byte aligned, ints are 4-byte aligned, etc.
Thus to make Data2 2-byte aligned, there will be a 1-byte padding inserted between Data1 and Data2.
It is also worth mentioning that there are mechanisms that can change the packing alignment. See #pragma pack.

Related

Is it allowed to access a common base class of union members regardless of the stored type?

Consider a union whose members share a common base class:
struct Base {
int common;
};
struct DerivedA : Base {};
struct DerivedB : Base {};
union Union {
DerivedA a;
DerivedB b;
};
No matter what the union "contains" at runtime (i.e., what the last stored value was), as long as it contains something, that something is subclass of Base. Is there then any way to legally use this idea to access the Base field, without knowing the actual type of the object stored in the union?
Maybe something like:
Base* p = reinterpret_cast<Base*>(&u);
... probably not. Maybe this:
Base* p2 = static_cast<Base *>(&u.a);
Is it legal if u.b was the last stored value?
I know there are special rules about "common initial sequences" that apply for unions, but it isn't clear if there is something similar for base classes.
Clearly it won't work for multiple inheritance, so maybe that's an indication it won't work at all.

Your example exactly as you typed it is in fact valid, but it doesn't allow for many useful changes.
The only valid lvalue-to-rvalue conversion on any part of an inactive member of a union is to access a part of that member's common initial sequence with the active member ([class.mem]/23).
But the common initial sequence is only defined for two standard-layout structs ([class.mem]/20), and there are quite a few rules for what qualifies as a standard-layout struct ([class]/7). Summarizing:
The class may not be polymorphic.
The class may not have more than one base class with the same type.
The class may not have a non-static member of reference type.
All non-static members of the class have the same access control.
All non-static members including inherited members are first declared in the same class.
All base classes and non-static members including inherited members obey all the above rules, recursively.
There are rules that say the first non-static member of a standard-layout struct has the same address as the struct, and that all non-static members of a standard-layout union have the same address of the union. But if any combination of these rules would imply that two objects of the same type must have the same address, the containing struct/union is not standard-layout.
(For an example of this last rule:
struct A {}; // Standard-layout
struct B { A a; }; // Standard-layout (and &b==&b.a)
union U { A a; B b; }; // Not standard-layout: &u.a==&u.b.a ??
struct C { U u; }; // Not standard-layout: U is not.
)
Your DerivedA and DerivedB are both standard-layout, so they are permitted to have a common initial sequence. In fact, that common sequence is the single int member of each, so they are in fact fully layout-compatible (and could therefore be part of a common initial sequence of some other pair of structs containing the two).
One of the trickier things here, though, is the rule about all members belonging to the same class. If you add any non-static member to DerivedA and/or DerivedB, even if you add a member of the same type to both, the changed struct(s) is/are no longer standard-layout at all, so there is no common initial sequence. This limits most of the realistic reasons you would have wanted to use inheritance in this pattern.

Accessing the base class thru any of the members that contain that base is legal, provided that the structs being used are standard layout.
In the example you've provided, the structs are standard layout, so you can access the base thru either u.a or u.b.

How is the memory layout of a class vs. a struct

I come from C programming where the data in a struct is laid out with the top variable first, then the second, third and so on..
I am now programming in C++ and I am using a class instead. I basically want to achieve the same, but I also want get/set methods and also maybe other methods (I also want to try do it in a C++ style and maye learn something new).
Is there a guarantee e.g. that the public variables will be first in memory then the private variable?

Is there a guarantee e.g. that the public variables will be first in
memory then the private variable?
No, such a guarantee is not made - C++11 standard, [class.mem]/14:
Nonstatic data members of a (non-union) class with the same access
control (Clause 11) are allocated so that later members have higher
addresses within a class object. The order of allocation of non-static
data members with different access control is unspecified (11).
So
struct A
{
int i, j;
std::string str;
private:
float f;
protected:
double d;
};
It is only guaranteed that, for a given object of type A,
i has a smaller address than j and
j has a smaller address than str
Note that the class-keys struct and class have no difference regarding layout whatsoever: Their only difference are access-rights which only exist at compile-time.
It only says the order, but not that the first variable actually start
at the "first address"? Lets assume a class without inheritance.
Yes, but only for standard-layout classes. There is a row of requirements a class must satisfy to be a standard-layout class, one of them being that all members have the same access-control.
Quoting C++14 (the same applies for C++11, but the wording is more indirect), [class.mem]/19:
If a standard-layout class object has any non-static data members, its
address is the same as the address of its first non-static data
member. Otherwise, its address is the same as the address of its first
base classsubobject (if any). [ Note: There might therefore be
unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. — end note ]
[class]/7:
A standard-layout class is a class that:
has no non-static data members of type non-standard-layout class (or array of such types) or reference,
has no virtual functions (10.3) and no virtual base classes (10.1),
has the same access control (Clause 11) for all non-static data members,
has no non-standard-layout base classes,
either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base
classes with non-static data members, and
has no base classes of the same type as the first non-static data member. 110
110) This ensures that two subobjects that have the same class type and that belong to the same most derived object are not
allocated at the same address (5.10).

First thing first: class and struct in C++ are very much the same - the only difference is that all members before the first access specifier in a class are considered private, while in a struct they are public.
Is there a guarantee e.g. that the public variables will be first in memory then the private variable?
There is no such guarantee. When there is no inheritance, the memory will be allocated to class members in the order in which you declare them within the same access group. It is up to the compiler to decide if the public member variables should be placed ahead of the private / protected ones or vice versa. Like C, C++ can add padding in between class members.
Inheritance makes things more complicated, because data members of the base class need to be placed within the derived class as well. On top of that, there is virtual inheritance and multiple inheritance, with complex rules.
I basically want to achieve the same [layout], but I also want get/set methods and also maybe other methods.
If you make all data members of your class private, and add accessor member functions (that's what C++ calls "methods" from other languages) you would achieve this effect.

Is 'this' guaranteed to point to the start of an object in C++?

I want to write an object into a sequential file using fwrite. The Class is like
class A{
int a;
int b;
public:
//interface
}
and when I write an object into a file. I am wandering that could I use fwrite( this, sizeof(int), 2, fo) to write the first two integers.
Question is: is this guaranteed to point to the start of the object data even if there may have a virtual table exist in the very beginning of the object. So the operation above is safe.

this provides the address of the object, which is not necessarily the address of the first member. The only exception are so-called standard-layout types. From the C++11 Standard:
(9.2/20) A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. — end note ]
This is the definition of a standard-layout type:
(9/7) A standard-layout class is a class that:
— has no non-static data members of type non-standard-layout class (or array of such types) or reference,
— has no virtual functions (10.3) and no virtual base classes (10.1),
— has the same access control (Clause 11) for all non-static data members,
— has no non-standard-layout base classes,
— either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
— has no base classes of the same type as the first non-static data member.[108]
[108] This ensures that two subobjects that have the same class type and that belong to the same most derived object are not allocated at the same address (5.10).
Note that the object type does not have to be a POD – having standard-layout as defined above is sufficient. (PODs all have standard-layout, but in addition, they are trivially constructible, trivially movable and trivially copyable.)
As far as I can tell from your code, your type seems to be standard-layout (make sure access control is the same for all non-static data members). In this case, this will indeed point to the initial member. Regarding using this for the purposes of serialization, the Standard actually says explicitly:
(9/9) [ Note: Standard-layout classes are useful for communicating with code written in other programming languages. Their layout is specified in 9.2. — end note ]
Of course this does not solve all problems of serialization. In particular, you won't get portability of the serialized data (e.g. because of endianness incompatibility).

No it's not. You could use fwrite(&a, sizeof(int), 2, fo), but you shouldn't either. Just strolling over raw memory is seldom a good idea when it comes to safety, because you should not rely on specific memory layouts. Someone could introduce another variable c between a and b, without noticing that he's breaking your code. If you want to access your variables, do that explicitly. Don't just access the memory where you think the variables are or where they once were the last time you checked.

Many answers have correctly said "No". Here is some code that demonstrates why the this is never guaranteed to point to the start of the object:
#include <iostream>
class A {
public: virtual int value1() { std::cout << this << "\n"; }
};
class B {
public: virtual int value2() { std::cout << this << "\n"; }
};
class C : public A, public B {};
int main(int argc, char** argv) {
C* c = new C();
A* a = (A*) c;
B* b = (B*) c;
a->value1();
b->value2();
return 0;
}
Please note the use of this in the virtual methods.
The output can (depending on the compiler) show you that pointers a and b are different. Most likely, a will point to the start of the object, but b will not. The problem appears most easily when multiple inheritance is in use.

Writing an object to a file using fwrite is a very bad idea for many reasons.
For example if your class contains an std::vector<int> you would be saving pointers to the integers, not the integers.
For "higher-level" reasons (alignment, versioning, binary compatibility) it's also a bad idea in most cases even in C and even when the members are just simple native types.

Safety of casting between pointers of two identical classes?

Let's say I have two different classes, both represent 2D coordinate data in the same internal way like the following:
class LibA_Vertex{
public:
// ... constructors and various methods, operator overloads
float x, y
};
class LibB_Vertex{
public:
// ... same usage and internal data as LibA, but with different methods
float x, y
};
void foobar(){
LibA_Vertex * verticesA = new LibA_Vertex[1000];
verticesA[50].y = 9;
LibB_Vertex * verticesB = reinterpret_cast<LibB_Vertex*>( vertexA );
print(verticesB[50].y); // should output a "9"
};
Given the two classes being identical and the function above, can I reliably count on this pointer conversion working as expected in every case?
(The background, is that I need an easy way of trading vertex arrays between two separate libraries that have identical Vertex classes, and I want to avoid needlessly copying arrays).

C++11 added a concept called layout-compatible which applies here.
Two standard-layout struct (Clause 9) types are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in declaration order) have layout-compatible types (3.9).
where
A standard-layout class is a class that:
has no non-static data members of type non-standard-layout class (or array of such types) or reference,
has no virtual functions (10.3) and no virtual base classes (10.1),
has the same access control (Clause 11) for all non-static data members,
has no non-standard-layout base classes,
either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
has no base classes of the same type as the first non-static data member.
A standard-layout struct is a standard-layout class defined with the class-key struct or the class-key class.
A standard-layout union is a standard-layout class defined with the class-key union.
Finally
Pointers to cv-qualified and cv-unqualified versions (3.9.3) of layout-compatible
types shall have the same value representation and alignment requirements (3.11).
Which guarantees that reinterpret_cast can turn a pointer to one type into a pointer to any layout-compatible type.

I would wrap that conversion up in a class (so that if you need to change platform or something, it's at least localised in one spot) but yes it should be possible.
You'll want to use reinterpret_cast, not static_cast as well.

Theoretically this is an undefined behavior. However, it may work in certain systems/platforms.
I would suggest that you should try to merge 2 classes into 1. i.e.
class Lib_Vertex{
// data (which is exactly same for both classes)
public:
// methods for LibA_Vertex
// methods for LibB_Vertex
};
Adding methods into a class will not affect its size. You may have to change your design a bit but it's worth it.

Technically this is undefined behavior. In reality, if the same compiler was used to compile both classes, they'll have the same layout in memory if the fields are declared in the same order, have the same types and the same access level.

In a class with no virtual methods or superclass, is it safe to assume (address of first member variable) == this?

I made a private API that assumes that the address of the first member-object in the class will be the same as the class's this-pointer... that way the member-object can trivially derive a pointer to the object that it is a member of, without having to store a pointer explicitly.
Given that I am willing to make sure that the container class won't inherit from any superclass, won't have any virtual methods, and that the member-object that does this trick will be the first member object declared, will that assumption hold valid for any C++ compiler, or do I need to use the offsetof() operator (or similar) to guarantee correctness?
To put it another way, the code below does what I expect under g++, but will it work everywhere?
class MyContainer
{
public:
MyContainer() {}
~MyContainer() {} // non-virtual dtor
private:
class MyContained
{
public:
MyContained() {}
~MyContained() {}
// Given that the only place Contained objects are declared is m_contained
// (below), will this work as expected on any C++ compiler?
MyContainer * GetPointerToMyContainer()
{
return reinterpret_cast<MyContainer *>(this);
}
};
MyContained m_contained; // MUST BE FIRST MEMBER ITEM DECLARED IN MyContainer
int m_foo; // other member items may be declared after m_contained
float m_bar;
};

It seems the current standard guarantees this only for POD types.
9.2.17
A pointer to a POD-struct object,
suitably converted, points to its
initial member (or if that member is a
bit-field, then to the unit in which
it resides) and vice versa. [Note:
There might therefore be unnamed
padding within a POD-struct object,
but not at its beginning, as necessary
to achieve appropriate alignment. ]
However, the C++0x standard seems to extend this guarantee to "standard-layout struct object"
A standard-layout class is a class
that:
— has no non-static data members of
type non-standard-layout class (or
array of such types) or reference,
— has no virtual functions (10.3) and
no virtual base classes (10.1),
— has the same access control (Clause
11) for all non-static data members,
— has no non-standard-layout base
classes,
— either has no non-static data
members in the most-derived class and
at most one base class with non-static
data members, or has no base classes
with non-static data members, and
— has no base classes of the same type
as the first non-static data member.
A standard-layout struct is a
standard-layout class defined with the
class-key struct or the class-key
class.
It is probably likely that the assumption holds in practice (and the former didn't just have these distinctions, though this could have been the intention)?

It is not guaranteed for non-POD types. C++ Standard 9.2/12:
Nonstatic data members of a
(non-union) class declared without an
intervening access-specifier are allo-
cated so that later members have
higher addresses within a class
object. The order of allocation of
nonstatic data members separated by an
access-specifier is unspecified
(11.1). Implementation alignment
require- ments might cause two
adjacent members not to be allocated
immediately after each other; so might
requirements for space for managing
virtual functions (10.3) and virtual
base classes (10.1).
In your case you have non-POD type since it contains custom destructor. More about POD types you could read here.

The latest C++ spec draft says this is ok, as long as the class qualifies as a standard layout class, which just requires
has no non-static data members of type non-standard-layout class (or array of such types) or reference,
has no virtual functions (10.3) and no virtual base classes (10.1),
has the same access control (Clause 11) for all non-static data members,
has no non-standard-layout base classes,
either has no non-static data members in the most-derived class and at most one base class with
non-static data members, or has no base classes with non-static data members, and
has no base classes of the same type as the first non-static data member.
Depending on the definition of MyContained, your class might or might not be standard layout
Note that POD-classes are the intersection of standard layout and trivially copyable classes

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Order of storage inside a structure / object - c++

Related

Is it allowed to access a common base class of union members regardless of the stored type?

How is the memory layout of a class vs. a struct

Is 'this' guaranteed to point to the start of an object in C++?

Safety of casting between pointers of two identical classes?

In a class with no virtual methods or superclass, is it safe to assume (address of first member variable) == this?

Categories

Resources