Compile-time Base class pointer offset to Derive class

Compile-time Base class pointer offset to Derive class - c++

class Base1 {
int x;
};
class Base2 {
int y;
};
class Derive : public Base1, public Base2 {
public:
enum {
PTR_OFFSET = ((int) (Base2*)(Derive*)1) - 1,
};
};
But the compiler complains
expected constant expression
Everyone knows that the expression values 4 except the compiler, what goes wrong?
How, then, to get the offset at compile time?

Addressing the immediate compiler error you are seeing in the supplied code, (Base2*)(Derive*)1 will most likely become reinterpret_casts when compiled, and that as DyP wrote as a comment to the question is not a constant expression which is required for enumeration initialization. Some compilers, notably GCC are not as strict on this point and will allow for reinterpret_cast in constant expressions even though it is forbidden by the standard (for further discussion of this see the comments for this GCC bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49171 and Constexpr pointer value).
The broader question of identifying at compile time what the layout of an object is and the offsets to its various members is a tricky one without a well-defined answer. The standard give implementers a lot of latitude to pad/pack an object's fields, usually out of alignment considerations (for a good summary see http://www.altdevblogaday.com/2013/05/03/cc-low-level-curriculum-part-11-inheritance/). While the relative ordering of an object's field must be maintained, int y in an instance of Derived need not be at an offset of sizeof(x) from the start of the instance of Derived; the answer is compiler and target architecture dependent.
All of that being said, this sort of structure layout information is determined at compile time and at least on some compilers is made accessible (even if not in portable, standards compliant ways). In the answer to this question, C++ Compile-Time offsetof inside a template, Jesse Good provides some code that on GCC at least will allow one to determine field offsets within a type at compile time. This code will unfortunately not provide the correct offsets for base class members.
A good solution to your problem awaits implementation of compile time reflection support in C++, something for which there is ongoing work as part of a standards working group: https://groups.google.com/a/isocpp.org/forum/#!forum/reflection .

Here is an example that works on Clang.
The approach uses a builtin that is available on both Clang and GCC. I have verified Clang (see Code Explorer link below), but I have not attempted with GCC.
#include <iostream>
/* Byte offsets are numbered here without accounting for padding (will not be correct). */
struct A { uint64_t byte_0, byte_8; uint32_t byte_16; };
struct B { uint16_t byte_20, byte_24; uint8_t byte28; uint64_t byte_29; };
struct C { uint32_t byte_37; uint8_t byte_41; };
struct D { uint64_t byte_42; };
struct E : A, B, C, D {};
template<typename Type, typename Base> constexpr const uintmax_t
offsetByStaticCast() {
constexpr const Type* Type_this = __builtin_constant_p( reinterpret_cast<const Type*>(0x1) )
? reinterpret_cast<const Type*>(0x1)
: reinterpret_cast<const Type*>(0x1);
constexpr const Base* Base_this = static_cast<const Base*>( Type_this );
constexpr const uint8_t* Type_this_bytes = __builtin_constant_p( reinterpret_cast<const uint8_t*>(Type_this) )
? reinterpret_cast<const uint8_t*>(Type_this)
: reinterpret_cast<const uint8_t*>(Type_this);
constexpr const uint8_t* Base_this_bytes = __builtin_constant_p( reinterpret_cast<const uint8_t*>(Base_this) )
? reinterpret_cast<const uint8_t*>(Base_this)
: reinterpret_cast<const uint8_t*>(Base_this);
constexpr const uintmax_t Base_offset = Base_this_bytes - Type_this_bytes;
return Base_offset;
}
int main()
{
std::cout << "Size of A: " << sizeof(A) << std::endl;
std::cout << "Size of B: " << sizeof(B) << std::endl;
std::cout << "Size of C: " << sizeof(C) << std::endl;
std::cout << "Size of D: " << sizeof(D) << std::endl;
std::cout << "Size of E: " << sizeof(E) << std::endl;
/* Actual byte offsets account for padding. */
std::cout << "A offset via offsetByStaticCast<E, A>(): " << offsetByStaticCast<E, A>() << std::endl;
std::cout << "B offset via offsetByStaticCast<E, B>(): " << offsetByStaticCast<E, B>() << std::endl;
std::cout << "C offset via offsetByStaticCast<E, C>(): " << offsetByStaticCast<E, C>() << std::endl;
std::cout << "D offset via offsetByStaticCast<E, D>(): " << offsetByStaticCast<E, D>() << std::endl;
return 0;
}
Output:
Size of A: 24
Size of B: 16
Size of C: 8
Size of D: 8
Size of E: 56
A offset via offsetByStaticCast<E, A>(): 0
B offset via offsetByStaticCast<E, B>(): 24
C offset via offsetByStaticCast<E, C>(): 40
D offset via offsetByStaticCast<E, D>(): 48
Program ended with exit code: 0
Code available on Compiler Explorer: https://godbolt.org/z/Gfe6YK
Based on helpful comments from constexpr and initialization of a static const void pointer with reinterpret cast, which compiler is right?, particularly including the link to a corresponding LLVM commit http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20120130/052477.html

Recently I found that the code ((int) (Base2*)(Derive*)1) - 1 broke down when Derive : public virtual Base2. That is, the offset of virtual base is unknown at compile-time. Therefore, it it forbidden in c++ standard.

Related

What happens after C++ references are compiled？

After compilation, what does the reference become, an address, or a constant pointer?
I know the difference between pointers and references, but I want to know the difference between the underlying implementations.
int main()
{
int a = 1;
int &b = a;
int *ptr = &a;
cout << b << " " << *ptr << endl; // 1 1
cout << "&b: " << &b << endl; // 0x61fe0c
cout << "ptr: " << ptr << endl; // 0x61fe0c
return 0;
}

The pedantic answer is: Whatever the compiler feels like, all that matters is that it works as specified by the language's semantics.
To get the actual answer, you have to look at resulting assembly, or make heavy usage of Undefined Behavior. At that point, it becomes a compiler-specific question, not a "C++ in general" question
In practice, references that need to be stored essentially become pointers, while local references tend to get compiled out of existence. The later is generally the case because the guarantee that references never get reassigned means that if you can see it getting assigned, then you know full well what it refers to. However, you should not be relying on this for correctness purposes.
For the sake of completeness
It is possible to get some insight into what the compiler is doing from within valid code by memcpying the contents of a struct containing a reference into a char buffer:
#include <iostream>
#include <array>
#include <cstring>
struct X {
int& ref;
};
int main() {
constexpr std::size_t x_size = sizeof(X);
int val = 12;
X val_ref = {val};
std::array<unsigned char, x_size> raw ;
std::memcpy(&raw, &val_ref, x_size);
std::cout << &val << std::endl;
std::cout << "0x";
for(const unsigned char c : raw) {
std::cout << std::hex << (int)c;
}
std::cout << std::endl ;
}
When I ran this on my compiler, I got the (endian flipped) address of val stored within the struct.

it heavily depend on compiler maybe compiler decide to optimize the code therefore it will make it value or ..., but as far i know references will compiler like pointer i mean if you see their result assembly they are compiled like pointer.

I don't understand why 'Derived1' requires the same amount of memory as 'Derived3'

In the following code I don't understand why 'Derived1' requires the same amount of memory as 'Derived3'. Also is there any specific significance of size of Derived 4 being 16.
#include <iostream>
using namespace std;
class Empty
{};
class Derived1 : public Empty
{};
class Derived2 : virtual public Empty
{};
class Derived3 : public Empty
{
char c;
};
class Derived4 : virtual public Empty
{
char c;
};
class Dummy
{
char c;
};
int main()
{
cout << "sizeof(Empty) " << sizeof(Empty) << endl;
cout << "sizeof(Derived1) " << sizeof(Derived1) << endl;
cout << "sizeof(Derived2) " << sizeof(Derived2) << endl;
cout << "sizeof(Derived3) " << sizeof(Derived3) << endl;
cout << "sizeof(Derived4) " << sizeof(Derived4) << endl;
cout << "sizeof(Dummy) " << sizeof(Dummy) << endl;
return 0;
}
The output of this code gave was:
sizeof(Empty) 1
sizeof(Derived1) 1
sizeof(Derived2) 8
sizeof(Derived3) 1
sizeof(Derived4) 16
sizeof(Dummy) 1

A class must have a sizeof of 1 or greater otherwise pointer arithmetic would break horribly, and the elements of array of that class would all occupy the same memory.
So sizeof(Derived1) is at least 1, as is sizeof(Empty). Empty base optimisation means that the size of the derived class is effectively zero.
sizeof(Derived3) can also be 1, since the single member is a char. Note that the compiler is again exploiting empty base optimisation here.
The polymorphic classes (i.e. those containing virtual keywords), have larger sizes due to your compiler implementing the requirements for polymorphic behaviour.

In the following code I don't understand why 'Derived1' requires the same amount of memory as 'Derived3'.
Unlike all other objects, empty base-sub-objects are allowed to share their address with their sibling objects. Therefore an empty base does not need to occupy any memory whatsoever.
Besides the empty base, Derived3 contains nothing more than a char object. The size of char is 1, so Derived3 fits within the size of 1.
Besides the empty base, Derived1 contains nothing. But because all objects (excluding exceptions involving sub-objects) must have a unique address, the minimum size of any type is 1. Therefore the size of Derived1 is 1.
Also is there any specific significance of size of Derived 4 being 16.
No significance whatsoever. The size is quite typical.

C++ object memory layout

I am trying to understand the object layout for C++ in multiple inheritance.
For this purpose, I have two superclasses A, B and one subclass C.
What I expected to see when trying dumping it was:
vfptr | fields of A | vfptr | fields of B | fields of C.
I get this model but with some zeros that I don't understand.
Here is the code I am trying
#include <iostream>
using namespace std;
class A{
public:
int a;
A(){ a = 5; }
virtual void foo(){ }
};
class B{
public:
int b;
B(){ b = 10; }
virtual void foo2(){ }
};
class C : public A, public B{
public:
int c;
C(){ c = 15; a = 20;}
virtual void foo2(){ cout << "Heeello!\n";}
};
int main()
{
C c;
int *ptr;
ptr = (int *)&c;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
return 0;
}
And here is the output I get:
4198384 //vfptr
0
20 // value of a
0
4198416 //vfptr
0
10 // value of b
15 // value of c
What is the meaning of the zeros in between?
Thanks in advance!

That depends upon your compiler. With clang-500, I get:
191787296
1
20
0
191787328
1
10
15
1785512560
I am sure there's a GDB way too, but this is what I get if I dump pointer-sized words with LLDB at the address of the class object:
0x7fff5fbff9d0: 0x0000000100002120 vtable for C + 16
0x7fff5fbff9d8: 0x0000000000000014
0x7fff5fbff9e0: 0x0000000100002140 vtable for C + 48
0x7fff5fbff9e8: 0x0000000f0000000a
This layout seems sensible, right? Just what you expect.
The reason why that doesn't show as clean in your program as it does in the debugger is that you are dumping int-sized words. On a 64-bit system sizeof(int)==4 but sizeof(void*)==8
So, you see your pointers split into (int,int) pairs. On Linux, your pointers don't have any bit set beyond the low 32, on OSX my pointers do - hence the reason for the 0 vs. 1 disparity

this is hugely architecture and compiler dependant... Possibly for you the size of a pointer might not be the size of an int... What architecture/compiler are you using?

If you're working on a 64-bit system, then:
The first zero is the 4 most-significant-bytes of the first vfptr.
The second zero is padding, so that the second vfptr will be aligned to an 8-byte address.
The third zero is the 4 most-significant-bytes of the second vfptr.
You can check if sizeof(void*) == 8 in order to assert that.

Hard to tell without knowing your platform and compiler, but this might be an alignment issue. In effect, the compiler might attempt to align class data along 8-byte boundaries, with zeroes used for padding.
Without the above details, this is merely speculation.

This is completely dependant on your compiler, system, bitness.
The virtual table pointer will have the size of a pointer. This depends on whether you are compiling your file as 32-bit or 64-bit. Pointers will also be aligned at a multiple address of their size (like any type will typically be). This is probably why you are seeing the 0 padding after the 20.
The integers will have the size of an integer on your specific system. This is usually always 32-bit. Note that if this isn't the case on your machine you will get unexpected results because you are increasing your ptr by sizeof(int) with pointer arithmetic.

If you use MVSC, you can dump all memory layout of all class in your solution with -d1reportAllClassLayout like that:
cl -d1reportAllClassLayout main.cpp
Hope it helpful to you

Loop over POD members

I'm wondering on how to properly loop over the members of a plain old data type, in order to get some type information on them. That is :
struct my_pod
{
int a;
double b;
};
template<typename POD>
void loopOverPOD()
{
for_each(POD, member) // The magic part
{
// member::type should be for my_pod int, then double
typename member::type i;
// member::size_of should be equal to sizeof(int) then sizeof(double)
// Trivial if we can have member::type information.
int size = member::size_of;
// member::offset_of should be equal to 0, then sizeof(int)
// Trivial if we can have member::size_of information.
int offset = member::offset_of;
}
}
As far as I know in C++, we can't do easy type introspection without doing some tricky plays with templates. But here, I can't find a concrete solution with templates, even with the use of macro in fact. And the problem is more about me rather than about the existence of a solution. :-)
I'm not necessarily asking for a solution that would not be intrusive.
Thanks in advance.

You could use boost.fusions ADAPT_STRUCT to turn your POD into a sequence and then use fusions for_each to apply a function object to each member. This is non-intrusive, your POD type will remain POD.
The good thing is that you could even put the ADAPT_STRUCT macros in a (header-) file separate from your struct definitions and only use them in code where you need to iterate.
The flip side is that this macro requires the redundancy of mentioning both the type and the name of the members again. I imagine that at some point fusion will use C++11 features to get rid of that redundancy (mentioning the type again). In the mean time, it is possible to create a macro that will declare the struct and the ADAP_STRUCT part.

If you use C++14 and newer, you can use Boost.Precise and Flat Reflection (https://github.com/apolukhin/magic_get/) for looping over your POD and boost::typeindex::type_id_runtime(field) to print type:
#include <iostream>
#include <boost/pfr/precise.hpp>
#include <boost/pfr/flat.hpp>
#include <boost/type_index.hpp>
struct my_pod
{
int a;
double b;
};
struct my_struct
{
char c;
my_pod pod;
};
int main() {
my_pod val{1, 2.5};
my_struct var{'a', 1, 2.5};
std::cout << "Flat:\n";
boost::pfr::flat_for_each_field(var, [](const auto& field, std::size_t idx) {
std::cout << idx << ": " << boost::typeindex::type_id_runtime(field) << "; value: " << field << '\n';
});
std::cout << "\nNot Flat:\n";
boost::pfr::for_each_field(var, [](const auto& field, std::size_t idx) {
using namespace boost::pfr::ops;
std::cout << idx << ": " << boost::typeindex::type_id_runtime(field) << "; value: " << field << '\n';
});
}
Output for this example:
Flat:
0: char; value: a
1: int; value: 1
2: double; value: 2.5
Not Flat:
0: char; value: a
1: my_pod; value: {1, 2.5}
Though I'm not sure how to get offset in this case...

C++ has no construct to iterate through members of a structure.
There exists however a standard type std::tuple for which you can use templates to recursively iterate through its elements at compile time.

empty base class optimization

Below is a simple test on ebco, I compiled it on both vc9 and g++. The outputs differ on both compilers. What I want to know is that whether the behavior of vc is conformant.
#include <iostream>
class empty
{
};
class empty_one : public empty {};
class empty_two : public empty {};
class non_empty
: public empty_one
, public empty_two
{
};
int main()
{
std::cout << "sizeof(empty): " << sizeof(empty) << std::endl;
std::cout << "sizeof(empty_one): " << sizeof(empty_one) << std::endl;
std::cout << "sizeof(empty_two): " << sizeof(empty_two) << std::endl;
std::cout << "sizeof(non_empty): " << sizeof(non_empty) << std::endl;
std::cout << std::endl;
non_empty a[2];
void* pe10 = static_cast<empty*>(static_cast<empty_one*>(&a[0]));
void* pe20 = static_cast<empty*>(static_cast<empty_two*>(&a[0]));
std::cout << "address of non_empty[0]: " << &a[0] << std::endl;
std::cout << "address of empty of empty_one: " << pe10 << std::endl;
std::cout << "address of empty of empty_two: " << pe20 << std::endl;
std::cout << std::endl;
void* pe11 = static_cast<empty*>(static_cast<empty_one*>(&a[1]));
void* pe21 = static_cast<empty*>(static_cast<empty_two*>(&a[1]));
std::cout << "address of non_empty[1]: " << &a[1] << std::endl;
std::cout << "address of empty of empty_one: " << pe11 << std::endl;
std::cout << "address of empty of empty_two: " << pe21 << std::endl;
}
On vc,
pe20 == pe11. (test1)
Can two sub-objects of two objects have the same address? Is this conformant?
Besides,
pe20 >= &a[0] + sizeof(a[0]) (test2)
Can the address of an sub-object passes the end of an object ?
On g++, above two tests does not hold.
EDIT: In c++0x standard draft, 1.8/6,
Unless an object is a bit-field or a base class subobject of zero size, the address of that object is the address of the first byte it occupies. Two distinct objects that are neither bit-fields nor base class subobjects of zero size shall have distinct addresses
The standard requires that two objects have different address when they are neither bit-fields nor base class subobjects of zero size. But it doesn't require that two sub-objects of zero size cannot have same address. So test1 can be true ?

pe10 == pe11. Can two sub-objects of
two objects have the same address? Is
this conformant?
No, two different objects cannot have same address. If they've, the compiler is not Standard Complaint.
By the way, which version of VC++ you're using? I'm using MSVC++2008, and it's output is this:
I think, you meant pe20==pe11? If so, then this also is wrong, non-standard. MSVC++2008 compiler has bug!
GCC is correct; see the output yourself : http://www.ideone.com/Cf2Ov
Similar topic:
When do programmers use Empty Base Optimization (EBO)

pe10 == pe11. Can two sub-objects of two objects have the same address? Is this conformant?
Nopes!
Size of an empty class cannot be zero because in that case two objects of the same class would be having the same address which is not possible.
Similarly two sub-objects of two different objects cannot have the same address. Your compiler is not conformant. Change it!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Compile-time Base class pointer offset to Derive class - c++

Recently I found that the code ((int) (Base2)(Derive)1) - 1 broke down when Derive : public virtual Base2. That is, the offset of virtual base is unknown at compile-time. Therefore, it it forbidden in c++ standard.

Related

What happens after C++ references are compiled？

I don't understand why 'Derived1' requires the same amount of memory as 'Derived3'

C++ object memory layout

Loop over POD members

empty base class optimization

Categories

Resources