C++ object memory layout - c++

I am trying to understand the object layout for C++ in multiple inheritance.
For this purpose, I have two superclasses A, B and one subclass C.
What I expected to see when trying dumping it was:
vfptr | fields of A | vfptr | fields of B | fields of C.
I get this model but with some zeros that I don't understand.
Here is the code I am trying
#include <iostream>
using namespace std;
class A{
public:
int a;
A(){ a = 5; }
virtual void foo(){ }
};
class B{
public:
int b;
B(){ b = 10; }
virtual void foo2(){ }
};
class C : public A, public B{
public:
int c;
C(){ c = 15; a = 20;}
virtual void foo2(){ cout << "Heeello!\n";}
};
int main()
{
C c;
int *ptr;
ptr = (int *)&c;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
ptr++;
cout << *ptr << endl;
return 0;
}
And here is the output I get:
4198384 //vfptr
0
20 // value of a
0
4198416 //vfptr
0
10 // value of b
15 // value of c
What is the meaning of the zeros in between?
Thanks in advance!

That depends upon your compiler. With clang-500, I get:
191787296
1
20
0
191787328
1
10
15
1785512560
I am sure there's a GDB way too, but this is what I get if I dump pointer-sized words with LLDB at the address of the class object:
0x7fff5fbff9d0: 0x0000000100002120 vtable for C + 16
0x7fff5fbff9d8: 0x0000000000000014
0x7fff5fbff9e0: 0x0000000100002140 vtable for C + 48
0x7fff5fbff9e8: 0x0000000f0000000a
This layout seems sensible, right? Just what you expect.
The reason why that doesn't show as clean in your program as it does in the debugger is that you are dumping int-sized words. On a 64-bit system sizeof(int)==4 but sizeof(void*)==8
So, you see your pointers split into (int,int) pairs. On Linux, your pointers don't have any bit set beyond the low 32, on OSX my pointers do - hence the reason for the 0 vs. 1 disparity

this is hugely architecture and compiler dependant... Possibly for you the size of a pointer might not be the size of an int... What architecture/compiler are you using?

If you're working on a 64-bit system, then:
The first zero is the 4 most-significant-bytes of the first vfptr.
The second zero is padding, so that the second vfptr will be aligned to an 8-byte address.
The third zero is the 4 most-significant-bytes of the second vfptr.
You can check if sizeof(void*) == 8 in order to assert that.

Hard to tell without knowing your platform and compiler, but this might be an alignment issue. In effect, the compiler might attempt to align class data along 8-byte boundaries, with zeroes used for padding.
Without the above details, this is merely speculation.

This is completely dependant on your compiler, system, bitness.
The virtual table pointer will have the size of a pointer. This depends on whether you are compiling your file as 32-bit or 64-bit. Pointers will also be aligned at a multiple address of their size (like any type will typically be). This is probably why you are seeing the 0 padding after the 20.
The integers will have the size of an integer on your specific system. This is usually always 32-bit. Note that if this isn't the case on your machine you will get unexpected results because you are increasing your ptr by sizeof(int) with pointer arithmetic.

If you use MVSC, you can dump all memory layout of all class in your solution with -d1reportAllClassLayout like that:
cl -d1reportAllClassLayout main.cpp
Hope it helpful to you

Related

What is the memory lay out of a C++ structure?

I was learning about the structures in C++ and got to know that if a structure in C++ has 3 variables (let each one be of some data type), then all of them are not allocated in a contiguous fashion. Is this correct?
If yes, then how much memory would be allocated to an object to that structure type?
For e.g. Let's say we have a structure like this:
struct a
{
int x;
int y;
char c;
};
Now how intuitively an object of type a, must occupy some space = sizeOf(int) + sizeOf(int) + sizeOf(char). But, if they are not allocated continuously, there could be some memory locations allocated for that object, that are just present for providing some padding - i.e. the memory allocated for that object could look something like this:
xxxx[4-bytes]xxxxx[4-bytes]xxxx[1-byte]xxx
(NOTE: In the above blockquote x corresponds to a memory location of size 1 byte. I also assumed that sizeOf(int) = 4-bytes and sizeOf(char) = 1- byte.)
So in the above one, we can see that the object a occupies more than 9-bytes (because there are some memory locations (x's) used for padding.)
So, does something like this happen?
Thanks for your replies!
P.S. Please let me know if I hadn't written something clearly.
When it comes to structs and classes, The layout is implementation-defined between compilers, os, and architecture... Some will use automatic alignment, others will use padding, some may even auto arrange it's members. If you need to know the size of a struct, use sizeof(Your Struct).
Here's a code snippet...
#include <iostream>
struct A {
char a;
float b;
int c;
};
struct B {
float a;
int b;
char c;
};
int main() {
std::cout << "Sizeof(A) = " << sizeof(A) << '\n';
std::cout << "Sizeof(B) = " << sizeof(B) << '\n';
return 0;
}
Output:
Sizeof(A) = 12
Sizeof(B) = 12
For my particular machine, I'm running Windows 7 - 64bit, It is an Intel Core2 Quad Extreme, and I'm using Visual Studio 2017 running it with C++17.
With my particular setup, both structures are being generated with a different layout, but have the same size in bytes.
In A's case...
char a; // 1 byte
// 3 bytes of padding
float b; // 4 bytes
int c; // 4 bytes (int is 32bit even on x64).
In B's case...
float a; // 4 bytes
int b; // 4 bytes
char c; // 1 byte
// 3 bytes of padding.
Also, your compiler flags and optimizations may have an effect. This isn't always guaranteed, as it is implementation-defined as stated in the standard.
--Edit--
Also, if you don't want this exact behavior there are some pragmas directives and macros such as pragma pack and alignas() that can be used to modify your implementation details. Here are a few references.
How to use alignas to replace pragma pack?
https://en.cppreference.com/w/cpp/preprocessor/impl
https://en.cppreference.com/w/cpp/language/alignas
https://learn.microsoft.com/en-us/cpp/cpp/alignment-cpp-declarations?view=msvc-160
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.4.0/com.ibm.zos.v2r4.cbclx01/pragma_pack.htm
https://www.ibm.com/support/knowledgecenter/SSLTBW_2.4.0/com.ibm.zos.v2r4.cbclx01/packqua.htm
https://www.iditect.com/how-to/57426535.html
https://downloads.ctfassets.net/oxjq45e8ilak/1LriV4eAdhNlu9Zv06H9NJ/53576095f772b5f6cddbbedccb7ebd8a/Alexander_Titov_Know_your_hardware_CPU_memory_hierarchy.pdf
https://cpc110.blogspot.com/2020/10/vs2019-alignas-in-struct-definition.html
So, structure Object variables take contiguous memory locations and the variables are stored in memory in the same order in which they are defined.
Please refer to the code below you will get an idea.
struct c{
int a;
int b;
int c;
};
int main()
{
c obj;
cout << &(obj.a) << endl; //0x7ffe128fe1c4
cout << &(obj.b) << endl; //0x7ffe128fe1c8
cout << &(obj.c) << endl; //0x7ffe128fe1cc
return 0;
}
For Structures Padding Concept Refer here

Dereference a structure to get value of first member

I found out that address of first element of structure is same as the address of structure. But dereferencing address of structure doesn't return me value of first data member. However dereferencing address of first data member does return it's value. eg. Address of structure=100, address of first element of structure is also 100. Now dereferencing should work in the same way on both.
Code:
#include <iostream>
#include <cstring>
struct things{
int good;
int bad;
};
int main()
{
things *ptr = new things;
ptr->bad = 3;
ptr->good = 7;
std::cout << *(&(ptr->good)) <<" " << &(ptr->good) << std::endl;
std::cout << "ptr also print same address = " << ptr << std::endl;
std::cout << "But *ptr does not print 7 and gives compile time error. Why ?" << *ptr << std::endl;
return 0;
}
*ptr returns to you an instance of type of things, for which there is no operator << defined, hence the compile-time error.
A struct is not the same as an array†. That is, it doesn't necessarily decay to a pointer to its first element. The compiler, in fact, is free to (and often does) insert padding in a struct so that it aligns to certain byte boundaries‡. So even if a struct could decay in the same way as an array (bad idea), simply printing it would not guarantee printing of the first element!
† I mean a C-Style array like int[]
‡ These boundaries are implementation-dependent and can often be controlled in some manner via preprocessor statements like pragma pack
Try any of these:
#include <iostream>
#include <cstring>
struct things{
int good;
int bad;
};
int main()
{
things *ptr = new things;
ptr->bad = 3;
ptr->good = 7;
std::cout << *(int*)ptr << std::endl;
std::cout << *reinterpret_cast<int*>(ptr) << std::endl;
int* p = reinterpret_cast<int*>(ptr);
std::cout << *p << std::endl;
return 0;
}
You can do a cast of the pointer to Struct, to a pointer to the first element of the struct so the compiler knows what size and alignment to use to collect the value from memory.
If you want a "clean" cast, you can consider converting it to "VOID pointer" first.
_ (Struct*) to (VOID*) to (FirstElem*) _
Also see:
Pointers in Stackoverflow
Hope it helps!!
I found out that address of first element of structure is same as the address of structure.
Wherever you found this out, it wasn't the c++ standard. It's an incorrect assumption in the general case.
There is nothing but misery and pain for you if you continue down this path.

What is (int*)?

I was trying to access the private data members of the class. Everything was going fine until I came upon the int*. I don’t get what it is. I think it’s something that we can use to create a new memory address.
My code :
#include <iostream>
using namespace std;
class x
{
int a, b, c, d;
public:
x()
{
a = 100;
b = 200;
c = 300;
d = 400;
}
};
int main()
{
x ob;
int *y = (int *)&ob;
cout << *y << " " << y[1] << " " << y[2] << " " << y[3] << endl;
}
Can anyone help me in understanding it?
Its a c-style cast to access the memory occupied by the struct x as a set of ints.
It takes the address of ob, casts it from 'address of' (ie a pointer to) x into a pointer to int. The compiler happily assigns this cast to y, so you can manipulate it, or in this case, print out the memory blocks as ints. As the struct happens to be a group of ints anyway, it all works even though its a bit of a hack. I guess the original coder wanted to print out all 4 ints without having to specify each one in turn by variable name. Lazy.
Try using a cast to a char* (ie 1 byte at a time) and print those out. You'll be basically printing out the raw memory occupied by the struct.
A good C++ way would be to create an operator<< function that returns each variable formatted for output like this, then write cout << ob << endl; instead.

Why struct size do not match with plain sum of item's sizes? [duplicate]

This question already has answers here:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
(13 answers)
Closed 9 years ago.
With C/C++ structs, normally, the final struct's size is the plain sum of the sizes of all its elements, but there are cases which this is not true.
I am looking at the technical answer (it has to be one) of why the following struct's size is not equal to the size of all its members:
#include <iostream>
struct {
int a;
long b;
char c;
} typedef MyStruct;
int main() {
MyStruct sss;
std::cout << "Size of int: " << sizeof(int) << std::endl << "Size of long: " << sizeof(long) << std::endl << "Size of char: " << sizeof(char) << std::endl;
std::cout << "Size of MyStruct: " << sizeof(sss) << std::endl;
return 0;
}
Thas has the following output:
Size of int: 4
Size of long: 8
Size of char: 1
Size of MyStruct: 24
So it can be seen than the size of MyStruct is not 4+8+1 (13) rather, it is actually 24, but why?
Because C allows compilers to add padding between the elements of a struct to make the generated code more performant.
Some hardware allows accessing multi-byte data faster when such data is placed at memory addresses divisible by the size of the data. For example, accessing a 32-bit int may go significantly faster when the int is located at address 0x8004 or 0x8008 than when the same int is located at an address 0x8003 or 0x8006. The standard allows compilers to adjust offsets of struct members to make use of such optimization.
Moreover, there are platforms where accessing a multi-byte value not aligned with the memory space results in an error: for example, reading a 16-bit int from an odd address in 68000 architecture triggers bus error. Compilers know this, and they add unused space to your struct so that accessing multi-byte fields does not trigger an error.

Pointer values are different but they compare equal. Why?

A short example outputs a weird result!
#include <iostream>
using namespace std;
struct A { int a; };
struct B { int b; };
struct C : A, B
{
int c;
};
int main()
{
C* c = new C;
B* b = c;
cout << "The address of b is 0x" << hex << b << endl;
cout << "The address of c is 0x" << hex << c << endl;
if (b == c)
{
cout << "b is equal to c" << endl;
}
else
{
cout << "b is not equal to c" << endl;
}
}
It's very surprising to me that the output should be as follows:
The address of b is 0x003E9A9C
The address of c is 0x003E9A98
b is equal to c
What makes me wonder is:
0x003E9A9C is not equal to 0x003E9A98, but the output is "b is equal to c"
A C object contains two sub-objects, of types A and B. Obviously, these must have different addresses since two separate objects can't have the same address; so at most one of these can have the same address as the C object. That is why printing the pointers gives different values.
Comparing the pointers doesn't simply compare their numeric values. Only pointers of the same type can be compared, so first one must be converted to match the other. In this case, c is converted to B*. This is exactly the same conversion used to initialise b in the first place: it adjusts the pointer value so that it points to the B sub-object rather than the C object, and the two pointers now compare equal.
The memory layout of an object of type C will look something like this:
| <---- C ----> |
|-A: a-|-B: b-|- c -|
0 4 8 12
I added the offset in bytes from the Address of the object (in a platform like yours with sizeof(int) = 4).
In your main, you have two pointers, I'll rename them to pb and pc for clarity. pc points to the start of the whole C object, while pb points to the start of the B subobject:
| <---- C ----> |
|-A: a-|-B: b-|- c -|
0 4 8 12
pc-^ pb-^
This is the reason why their values are different. 3E9A98+4 is 3E9A9C, in hex.
If you now compare those two pointers, the compiler will see a comparison between a B* and a C*, which are different types. So it has to apply an implicit conversion, if there is one. pb cannot be converted into a C*, but the other way round is possible - it converts pc into a B*. That conversion will give a pointer that points to the B subobject of wherever pc points to - it is the same implicit conversion used when you defined B* pb = pc;. The result is equal to pb, obviously:
| <---- C ----> |
|-A: a-|-B: b-|- c -|
0 4 8 12
pc-^ pb-^
(B*)pc-^
So when comparing the two pointers, the compiler in fact compares the converted pointers, which are equal.
I know there is an answer but maybe this will be more straightforward and backed-up by an example.
There is an implicit conversion from C* to B* on c operand in here if (b == c)
If you go with this code:
#include <iostream>
using namespace std;
struct A { int a; };
struct B { int b; };
struct C : A, B
{
int c;
};
int main()
{
C* c = new C;
B* b = c;
cout << "The address of b is 0x" << hex << b << endl;
cout << "The address of c is 0x" << hex << c << endl;
cout << "The address of (B*)c is 0x" << hex << (B*)c << endl;
if (b == c)
{
cout << "b is equal to c" << endl;
}
else
{
cout << "b is not equal to c" << endl;
}
}
You get:
The address of b is 0x0x88f900c
The address of c is 0x0x88f9008
The address of (B*)c is 0x0x88f900c
b is equal to c
So c casted to B* type has the same address as b. As expected.
If I may add to Mike's excellent answer, if you cast them as void* then you will get your expected behaviour:
if ((void*)(b) == (void*)(c))
^^^^^^^ ^^^^^^^
prints
b is not equal to c
Doing something similar on C (the language) actually irritated the compiler due to the different types of the pointers compared.
I got:
warning: comparison of distinct pointer types lacks a cast [enabled by default]
In computing (or, rather, we should say in mathematics) there can be many notions of equality. Any relation that is symmetric, reflexive and transitive can be employed as equality.
In your program, you are examining two somewhat different notions of equality: bitwise implementation identity (two pointers being to exactly the same address) versus another kind of equality based on object identity, which allows two views on the same object, through references of different static type, to be properly regarded as referencing the same object.
These differently typed views use pointers which do not have the same address value, because they latch on to different parts of the object. The compiler knows this and so it generates the correct code for the equality comparison which takes into account this offset.
It is the structure of objects brought about by inheritance which makes it necessary to have these offsets. When there are multiple bases (thanks to multiple inheritance), only one of those bases can be at the low address of the object, so that the pointer to the base part is the same as the pointer to the derived object. The other base parts are elsewhere in the object.
So, naive, bitwise comparison of pointers would not yield the correct results according to the object-oriented view of the object.
Some good answers here, but there's a short version. "Two objects are the same" does not mean they have the same address. It means putting data into them and taking data out of them is equivalent.