How is the memory of a class handled in C++? [duplicate] - c++

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How are objects stored in memory in C++?
For example a C++ class:
class A{
int value;
void addOne(){
value++;
}
}
Will an instance of class A be loaded like this [pseudo code]:
[identifier of A]
[this is int value]
[this is void addOne(void)][value++]
Or like this:
[members identifier of A]
[this is int value]
[functions identifier of A]
[this is void addOne(ref to member of A)][A.value++]
Second should use less memory on multiple instances of a class. Because the same functions are used for all instances. How is the memory handled in C++? Is it possible to change memory handling?

You are asking if the member functions are stored in the instances of the class? No. Each instance uses the same functions, with it's[the instance's] address passed as the hidden parameter this.

The layout is actually more like this:
Class instance:
[this is int value]
Somewhere else:
[this is void addOne(ref to member of A)][A.value++]
That is, a class consists of (exactly) its member variables and its base classes, nothing more (unless your class contains virtual functions, in which case it also contains a virtual function table) – in particular no “identifier”.
Same for its functions, which are stored somewhere else entirely, and not once for each class instance. Furthermore, a function of a class doesn’t contain a reference to its class either, nor to its members. It is just a memory block of code (machine code statements). When calling the function, you are (basically) *jumping to that location after pushing a pointer to the class instance onto the call stack. The method can then access the instance (and thus its members) by accessing the pointer on the stack.

If your question is about the memory layout of class A it is the same as having a struct A i.e. the integer value. The 4 bytes (or whatever bytes for int for that platform).
Functions are not part of the memory layout e.g. as pointers stored in function or something similar, so they do not affect the size of the class.
If the class A though was polymorphic class the size would be different as it would contain also the pointer to vtable.

Well, on Randall' Hyde's Write Great Code Volume 2 (great book, read it if you have some free time) there's a section that speaks just about that. Briefly, a class holds variables just like structs, but it has a record which keeps pointers to functions declared in that class (VMT):
VMT stands for virtual method table,
and these 4 bytes contain a pointer to
an array of “method pointers” for the
class. Virtual methods (also known as
virtual member functions in C++) are
special class-related functions that
you declare as fields in the class.
[...]
Calling a virtual member function
requires two indirect accesses. First,
the program has to fetch the VMT
pointer from the class object and use
that to indirectly fetch a particular
virtual function address from the VMT.
Then the program has to make an
indirect call to the virtual member
function via the pointer it
retrieved from the VMT.
[...]
For a given class there is only one
copy of the VMT in memory. This is a
static object so all objects of a
given class type share the same VMT.

Related

virtual table and _vptr storage scheme

Can someone explains how this virtual table for the different class is stored in memory? When we call a function using pointer how do they make a call to function using address location? Can we get these virtual table memory allocation size using a class pointer? I want to see how many memory blocks is used by a virtual table for a class. How can I see it?
class Base
{
public:
FunctionPointer *__vptr;
virtual void function1() {};
virtual void function2() {};
};
class D1: public Base
{
public:
virtual void function1() {};
};
class D2: public Base
{
public:
virtual void function2() {};
};
int main()
{
D1 d1;
Base *dPtr = &d1;
dPtr->function1();
}
Thanks! in advance
The first point to keep in mind is a disclaimer: none of this is actually guaranteed by the standard. The standard says what the code needs to look like and how it should work, but doesn't actually specify exactly how the compiler needs to make that happen.
That said, essentially all C++ compilers work quite similarly in this respect.
So, let's start with non-virtual functions. They come in two classes: static and non-static.
The simpler of the two are static member functions. A static member function is almost like a global function that's a friend of the class, except that it also needs the class`s name as a prefix to the function name.
Non-static member functions are a little more complex. They're still normal functions that are called directly--but they're passed a hidden pointer to the instance of the object on which they were called. Inside the function, you can use the keyword this to refer to that instance data. So, when you call something like a.func(b);, the code that's generated is pretty similar to code you'd get for func(a, b);
Now let's consider virtual functions. Here's where we get into vtables and vtable pointers. We have enough indirection going on that it's probably best to draw some diagrams to see how it's all laid out. Here's pretty much the simplest case: one instance of one class with two virtual functions:
So, the object contains its data and a pointer to the vtable. The vtable contains a pointer to each virtual function defined by that class. It may not be immediately apparent, however, why we need so much indirection. To understand that, let's look at the next (ever so slightly) more complex case: two instances of that class:
Note how each instance of the class has its own data, but they both share the same vtable and the same code--and if we had more instances, they'd still all share the one vtable among all the instances of the same class.
Now, let's consider derivation/inheritance. As an example, let's rename our existing class to "Base", and add a derived class. Since I'm feeling imaginative, I'll name it "Derived". As above, the base class defines two virtual functions. The derived class overrides one (but not the other) of those:
Of course, we can combine the two, having multiple instances of each of the base and/or derived class:
Now let's delve into that in a little more detail. The interesting thing about derivation is that we can pass a pointer/reference to an object of the derived class to a function written to receive a pointer/reference to the base class, and it still works--but if you invoke a virtual function, you get the version for the actual class, not the base class. So, how does that work? How can we treat an instance of the derived class as if it were an instance of the base class, and still have it work? To do it, each derived object has a "base class subobject". For example, lets consider code like this:
struct simple_base {
int a;
};
struct simple_derived : public simple_base {
int b;
};
In this case, when you create an instance of simple_derived, you get an object containing two ints: a and b. The a (base class part) is at the beginning of the object in memory, and the b (derived class part) follows that. So, if you pass the address of the object to a function expecting an instance of the base class, it uses on the part(s) that exist in the base class, which the compiler places at the same offsets in the object as they'd be in an object of the base class, so the function can manipulate them without even knowing that it's dealing with an object of the derived class. Likewise, if you invoke a virtual function all it needs to know is the location of the vtable pointer. As far as it cares, something like Base::func1 basically just means it follows the vtable pointer, then uses a pointer to a function at some specified offset from there (e.g., the fourth function pointer).
At least for now, I'm going to ignore multiple inheritance. It adds quite a bit of complexity to the picture (especially when virtual inheritance gets involved) and you haven't mentioned it at all, so I doubt you really care.
As to accessing any of this, or using in any way other than simply calling virtual functions: you may be able to come up with something for a specific compiler--but don't expect it to be portable at all. Although things like debuggers often need to look at such stuff, the code involved tends to be quite fragile and compiler-specific.
The virtual table is supposed to be shared between instances of a class. More precisely, it lives at the "class" level, rather than the instance level. Each instance has the overhead of actually having a pointer to the virtual table, if in it's hierarchy there are virtual functions and classes.
The table itself is at least the size necessary to hold a pointer for each virtual function. Other than that, it is an implementation detail how it's actually defined. Check here for a SO question with more details about this.
First of all, the following answer contain almost everything you want to know regarding virtual tables:
https://stackoverflow.com/a/16097013/8908931
If you are looking for something a little more specific (with the regular disclaimer that this might change between platforms, compilers, and CPU architectures):
When needed, a virtual table is being created for a class. The class will have only one instance of the virtual table, and each object of the class will have a pointer which will point to the memory location of this virtual table. The virtual table itself can be thought of as a simple array of pointers.
When you assigned the derived pointer to the base pointer, it also contain the pointer to the virtual table. This mean that the base pointer points to the virtual table of the derived class. The compiler will direct this call to an offset into the virtual table, which will contain the actual address of the function from the derived class.
Not really. Usually at the start of an object, there is a pointer to the virtual table itself. But this will not help you too much, as it is just an array of pointers, with no real indication of its size.
Making a very long answer short: For an exact size you can find this information in the executable (or in segments loaded from it to the memory). With enough knowledge of how the virtual table works, you can get a pretty accurate estimation, given you know the code, the compiler, and the target architecture.
For the exact size, you can find this information in either the executable, or in segments in the memory which are being loaded from the executable. An executable is usually an ELF file, this kind of files, contain information which is needed to run a program. A part of this information is symbols for various kinds of language constructs such as variables, functions and virtual tables. For each symbol, it contains the size it takes in memory. So button line, you will need the symbol name of the virtual table, and enough knowledge in ELF in order to extract what you want.
The answer that Jerry Coffin gave is excellent in explaining how virtual function pointers work to achieve runtime polymorphism in C++. However, I believe that it is lacking in answering where in memory the vtable is stored. As others have pointed out this is not dictated by the standard.
However, there is an excellent blog post(s) by Martin Kysel that goes into great detail about where virtual tables are stored. To summarize the blog post(s):
One vtable is created for every class (not instance) with virtual functions. Each instance of this class points to the same vtable in memory
Each vtable is stored in read only memory of the resulting binary file
The disassembly for each function in the vtable is stored in the text section of the resulting ELF binary
Attempting to write over the vtable, located in read only memory, results in a Segmentation fault (as expected)
Each class has a pointer to a list of functions, they are each in the same order for derived classes, then the specific functions that are overrided change at that position in the list.
When you point with a base pointer type, the pointed to object still has the correct _vptr.
Base's
Base::function1()
Base::function2()
D1's
D1::function1()
Base::function2()
D2's
Base::function1()
D2::function2()
Further derived drom D1 or D2 will just add their new virtual functions in the list below the 2 current.
When calling a virtual function we just call the corresponding index, function1 will be index 0
So your call
dPtr->function1();
is actually
dPtr->_vptr[0]();

C++ object representation

I have a doubt: I can declare a pointer to a class member function
void (*MyClass::myFunc)(void);
and I can declare a pointer to a class member variable
int (MyClass::*var);
My question is: how is an object (composed by member functions and member variables) structured in memory (asm-level) ?
I'm not sure because, except for polymorphism and runtime virtual functions, I can declare a pointer to a member function even without an object and this implies that the code functions are shared among multiple classes (although they require a *this pointer to work properly)
But what about the variables? How come I can declare a pointer to a member variable even without an object instance? Of course I need one to use it, but the fact that I can declare a pointer without an object makes me think a class object in memory represents its variables with pointers to other memory regions.
I'm not sure if I explained properly my doubt, if not just let me know and I'll try to explain it better
Classes are stored in memory quite simply - almost the same way as structures. If you inspect the memory in the place, where the class instance is stored, you'll notice, that its fields are simply packed one after another.
There's a difference though, if your class have virtual methods. In such case the first thing stored in a class instance is a pointer to a virtual method table, which allows virtual methods to work properly. You can read more about this on the Internet, that's a little more advanced topic. Luckily, you don't have to worry about that, compiler does it all for you (I mean, handling VMT, not worrying).
Let's go to the methods. When you see:
void MyClass::myFunc(int i, int j) { }
Actually the compiler converts it into something like:
void myFunc(MyClass * this, int i, int j) { }
And when you call:
myClassInstance->myFunc(1, 2);
Compiler generates the following code:
myFunc(myClassInstance, 1, 2);
Please keep in mind, that this is a simplification - sometimes it's a little more complicated than this (especially when we discuss the virtual method calls), but it shows more or less, how classes are handled by the compiler. If you use some low-level debugger such as WinDbg, you can inspect parameters of the method call and you'll see, that the first parameter is usually a pointer to class instance you called the method on.
Now, all classes of the same type share their methods' binaries (compiled code). Therefore there is no point in making copy of them for each class instance, so there is only one copy held in the memory and all instances use it. It should be clear now, why can you get the pointer to method even if you have no instance of the class.
However, if you want to call the method kept in a variable, you always have to provide a class instance, which can be passed by the hidden "this" parameter.
Edit: In response to comments
You can read more about pointer members in another SO question. I guess, that pointer to member stores the difference between the beginning of classes instance and the specified field. When you try to retrieve the value of a field using the pointer-to-member, compiler locates the beginning of classes instance and move by amount of bytes stored in pointer-to-member to reach the specified field.
Each class instance has its own copy of non-static fields - otherwise they wouldn't be much of a use for us.
Notice, that similarly to pointers to methods, you cannot use pointer to member directly, you again have to provide a class instance.
A proof of what I say would be in order, so here it is:
class C
{
public:
int a;
int b;
};
// Disassembly of fragment of code:
int C::*pointerToA = &C::a;
00DB438C mov dword ptr [pointerToA],0
int C::*pointerToB = &C::b;
00DB4393 mov dword ptr [pointerToB],4
Can you see the values stored in pointerToA and pointerToB? Field a is distant by 0 bytes from the beginning of classes instance, so value 0 is stored in pointerToA. On the other hand, field b is stored after the field a, which is 4 bytes long, so value 4 is stored in pointerToB.

Memory locations of classes (refers to Inside the C++ object model book)

I am currently reading Inside the C++ Object Model. On page 9 it has a diagram showing how the contents of a class are laid out in memory. It states the only part of an object which actually resides in the class memory are non-static data members.
Here is a post from SO regarding the contents of memory for a program:
Global memory management in C++ in stack or heap?
In the second answer it details the memory layout of a program- showing the stack and the heap.
Does the location of the static data members/any class function (basically the parts of the class which are not stored within the object- referring to page 9) change depending whether the object is on the stack or the heap?
Static data members reside in the same area of memory that global variables and plain static variables would reside. It is the "class memory" that could either be on the stack or heap, depending on how the instance of the class was created.
A static data member is not too different from a global variable. However, it is scoped by the class name, and its access by name can be controlled via public, private, and protected. public gives access to everyone. private would restrict access to only members of the class, and protected is like private but extends access to a class that inherits from the class with the static data member.
In contrast, a global variable is accessible by name by everyone. A plain static variable is accessible by name by code in the same source file.
A plain class method is actually just a regular function (modulo access controls), but it has an implicit this parameter. They do not occupy any space in a class. However, a virtual class method would occupy some memory in the class, since it has to resolve to a derived class's implementation of the method. But, polymorphism is likely not yet covered where you are in your textbook.
No, where variables are allocated doesn't affect the storage of static data or code. These are generally stored in separate memory areas, that are neither stack or heap.
Functions and static data members are special in that that there is only one copy of each in the whole program.
Variables of class, or other, types are most often created and destroyed multiple times during a program run.

Where is the virtual function table for C++ class stored?

I tried to find where exactly virtual function table gets stored for c++ class.
i found some answers like its a "static array of function pointers"
so will it get stored in data segment read only memory? (initialised one)
Most probably yes. However, it's not mandated. It's not even mandated that polymorphism is implemented via virtual function table, but on most platforms it is. These are implementation details, as long as a compiler obeys the behavior set by the standard, it can do whatever it wants.
A vftable is one per class and stored in only one place in memory.
When you make any function virtual, the compiler will insert a vptr inside your class. As a result, the size of the class will grow by 4 bytes (on Win32).This pointer holds the address of the virtual table (vtable). vtable is constructed by the compiler at compile time and is basically nothing but an array of function pointers. The function pointers are actually pointers to the virtual functions of that particular class. To be more exact, the virtual table is a static array of function pointers, so that different instances of the same class can share that vtable. Since, static members are stored in the data section (.data), the vtable is also stored in the data section of the executable.
It's implementation dependent, yes.
And for g++ (4.9.0), virtual table (not the pointer) is stored in the section .rodata of ELF file and its corresponding segment LOAD in memory.

CreateFileMapping with class containing virtual methods

I want to create an instance of a class and place it in shared memory so the same instance can be called from multiple processes. However, this class has virtual methods which I think may cause problems as I have read the mapped data can't contain pointers, which would be the case here with the vtable in the class. Will it work?
As Kerrek SB commented, you cannot map a class containing virtual methods. But you can probably make a simple struct or class without virtuals, map that, and then give a pointer to it to another class which does have virtuals and uses the plain struct as its implementation. Basically, the Pimpl idiom.
If needed, you can even do something like virtual dispatch yourself by storing a "type" integer in the plain struct, and inspecting it to decide which functions to invoke.