Here is a simple C++ class, named A:
class A
{
public:
explicit A() : m_a(0) { }
explicit A(int a) m_a(a) { }
int getA() const { return m_a; }
void setA(int a) { m_a = a; }
private:
int m_a;
}
This is what I know so far:
When you declare an object of a class instance, memory gets allocated for that object. The allocated memory is equivalent to the memory of its members summed up. So in my case, it is:sizeof(A) = sizeof(int) = sizeof(m_a)
All member functions of class A are stored somewhere in memory and all instances of class A use the same member functions.
This is what I don't know:
Where are member functions stored and how are they actually stored? Let's say that an int for example is stored on 4 bytes; I can imagine the RAM memory layout with 4 contiguous cells each storing a part of that int. How can I imagine this layout for a function?(this could sound silly, but I imagine functions must have a place in memory because you can have a pointer point to them). Also how and where are function instructions stored? My first perception was that functions and function instructions are stored in the program executable(and its dynamic or static libraries) but if this is true what happens when you create a function pointer? AFAIK function pointers point to locations in RAM memory, can they point to locations in program binaries? If yes, how does this work?
Can anyone explain to me how this works and point out if what I know is right or wrong?
First, you need to understand the role of the linker and what are executables (usually executed in virtual memory) and address spaces & processes. On Linux, read about ELF and the execve(2) syscall. Read also Levine's Linkers & Loaders book and Operating Systems: Three Easy Pieces, and the C++11 standard n3337, and this draft report and a good C++ programming book, with this reference website.
Member functions can be virtual or plain functions.
A plain (non virtual) member function is just like a C function (except that it has this as an implicit, often first, parameter). For example your getA method is implemented like the following C function (outside of the object, e.g. in the code segment of the binary executable) :
int C$getA(A*thisptr) const { return thisptr->m_a; }
then imagine that the compiler is translating p->getA() into C$getA(p)
A virtual member function is generally implemented thru a vtable (virtual method table). An object with some virtual member functions (including destructor) has generally as its first (implicit) member field a pointer to such a table (generated elsewhere by the compiler). Your class A don't have any virtual method, but imagine if it had an additional virtual void print(std::ostream&); method, then your class A would have the same layout as
struct A$ {
struct A$virtualmethodtable* _vptr;
int m_a;
};
and the virtual table might be
struct A$virtualmethodtable {
void (*print$fun) (struct A$*, std::ostream*);
};
(so adding other virtual functions means simply adding slot inside that vtable);
and then a call like p->print(std::cout); would be translated almost like
p->_vptr.print$fun(p,&std::cout); ... In addition, the compiler would generate as constant tables various virtual method tables (one per class).
NB: things are more complex with multiple or virtual inheritance.
In both cases, member functions don't eat any additional space in the object. If it is non-virtual, it is just a plain function (in the code segment). If it is virtual, it shares a slot in the virtual method table.
NB. If you compile with a recent GCC (i.e. with g++) or with a Clang (so clang++) you could pass it e.g. the -fdump-tree-all flag: it will produce hundreds of dump files showing partly -in a dumped textual form- some internal representations of the compiler, which you could inspect with a pager (e.g. less) or a textual editor. You could also use MELT or look at the assembly code produced with g++ -S -fverbose-asm -O1 ....
All local non static variables and Non virtual functions are saved in code/text segment.
All static and global variables are saved in static data segment.
Class with virtual functions or inherited from a class with virtual functions will inserted a vptr pointer by compiler. Vptr points to a virtual function table which has number of functions slots. Each slot contains the function address which is stored in the code segment.
To understand this you need to learn about memory layout of a program. the code will be shared by the objects. and all objects will have their own copy of data.
Related
This question already has answers here:
Where are member functions stored for an object?
(2 answers)
Closed 6 years ago.
Whenever object is created for a class, memory space will be allocated for a class. So my question is: Do memory created for only member variables or for member functions also?? If memory is created for member functions, then where they will be stored??
Traditionally executable files had three sections. One for initialized data, one for uninitialized data, and one for code. This traditional partitioning is still very much in use, and code, no matter where it comes from, is placed in a separate section.
When an operating system load an executable file into memory, it puts the code in a separate place in memory that it marks as executable (on modern memory-protected systems) and all code are stored there separate from the objects themselves.
Member functions are just code located in the code segment. they are present exact one time, no matter how many objects you have.
They are nearly exactly the same as ordinary functions except that their first parameter is the this pointer, that is hidden in the language but present as a parameter on the executable code.
But there are two kinds of member functions:
"normal"
virtual
there is no difference between them in the sense of code size however they are called differently. Calls to normal functions can be determined by compiletime and the other are indirect calls via the function pointers-
If a class has a virtual member functions (the class is "polymorph") the compiler needs to create a "vtable" for this class (not object).
Each object does contain a pointer to the vtable of its class. this is necessary to access the correct virtual function if the object is accessed by a pointer that is of a base classes type.
Example:
class A{
public: bool doSomething();
int i;
};
class B:public A {
public: bool doSomething();
int j;
}
//
B b;
A* a = &b;
a->doSomething(); // <- A::doSomething() is called;
//
neither of this classes needs a vtable.
Example 2:
class A{
public: virtual bool doSomething();
int i;
};
class B:public A {
public: bool doSomething();
int j;
}
//
B b;
A* a = &b;
a->doSomething(); // <- B::doSomething() is called;
//
A (and all its childs) get a vtable. Then an object is created the objects vtable pointer is set to the correct table, so that independently from the Type of the pointer the correct function is called.
Only the member variables (plus padding between and after them) contribute to the sizeof of a class.
So in that sense regular functions do not take up space as far as an object is concerned. Member functions are little more than regular static functions with a implicit this pointer as a hidden argument.
Saying that though, a virtual function table might be the way an implemention deals with polymorphic types, and that will take up some space, but will probably only be a pointer to a table used by all objects of that particular class.
How to identify whether vptr will be used to invoke a virtual function?
Consider the below hierarchy:
class A
{
int n;
public:
virtual void funcA()
{std::cout <<"A::funcA()" << std::endl;}
};
class B: public A
{
public:
virtual void funcB()
{std::cout <<"B::funcB()" << std::endl;}
};
A* obj = new B();
obj->funcB(); //1. this does not even compile
typedef void (*fB)();
fB* func;
int* vptr = (int*)obj; //2. Accessing the vptr
func = (fB*)(*vptr);
func[1](); //3. Calling funcB using vptr.
Statement 1. i.e. obj->funcB(); does not even compile although Vtable has an entry for funcB where as on accessing vPtr indirectly funcB() can be invoked successfully.
How does compiler decide when to use the vTable to invoke a function?
In the statement A* obj = new B(); since I am using a base class pointer so I believe vtable should be used to invoke the function.
Below is the memory layout when vptr is accessed indirectly.
So there are two answers to your question:
The short one is:
obj->FuncB() is only a legal call, if the static type of obj (in this case A) has a function FuncB with the appropriate signature (either directly or due to a base class). Only if that is the case, the compiler decides whether it translates it to a direct or dynamic function call (e.g. using a vtable), based on whether FuncB is declared virtual or not in the declaration of A (or its base type).
The longer one is this:
When the compiler sees obj->funcB() it has no way of knowing (optimizations aside), what the runtime type of obj is and especially it doesn't know, whether a derived class that implements funcB() exists, at all. obj might e.g. be created in another translation unit or it might be a function parameter.
And no, that information is usually not stored in the virtual function table:
The vtable is just an array of addresses and without the prior knowledge that a specific addess corresponds to a function called funcB, the compiler can't use it to implement the call obj->funcB()- or to be more precise: it is not allowed to do so by the standard. That prior knowledge can only be provided by a virtual function declaration in the static type of obj (or its base classes).
The reason, why you have that information available in the debugger (whose behavior lys outside of the standard anyway) is, because it has access to the debugging symbols, which are usually not part of the distributed release binary. Storing that information in the vtable by default, would be a waste of memory and performance, as the program isn't allowed to make use of it in standard c++ in the way you describe anyway. For extensions like C++/CLI that might be a different story.
Adding to Barry's comment, adding the line virtual void funcB() = 0; to class A seems to fix the problem.
Here is a simple C++ class, named A:
class A
{
public:
explicit A() : m_a(0) { }
explicit A(int a) m_a(a) { }
int getA() const { return m_a; }
void setA(int a) { m_a = a; }
private:
int m_a;
}
This is what I know so far:
When you declare an object of a class instance, memory gets allocated for that object. The allocated memory is equivalent to the memory of its members summed up. So in my case, it is:sizeof(A) = sizeof(int) = sizeof(m_a)
All member functions of class A are stored somewhere in memory and all instances of class A use the same member functions.
This is what I don't know:
Where are member functions stored and how are they actually stored? Let's say that an int for example is stored on 4 bytes; I can imagine the RAM memory layout with 4 contiguous cells each storing a part of that int. How can I imagine this layout for a function?(this could sound silly, but I imagine functions must have a place in memory because you can have a pointer point to them). Also how and where are function instructions stored? My first perception was that functions and function instructions are stored in the program executable(and its dynamic or static libraries) but if this is true what happens when you create a function pointer? AFAIK function pointers point to locations in RAM memory, can they point to locations in program binaries? If yes, how does this work?
Can anyone explain to me how this works and point out if what I know is right or wrong?
First, you need to understand the role of the linker and what are executables (usually executed in virtual memory) and address spaces & processes. On Linux, read about ELF and the execve(2) syscall. Read also Levine's Linkers & Loaders book and Operating Systems: Three Easy Pieces, and the C++11 standard n3337, and this draft report and a good C++ programming book, with this reference website.
Member functions can be virtual or plain functions.
A plain (non virtual) member function is just like a C function (except that it has this as an implicit, often first, parameter). For example your getA method is implemented like the following C function (outside of the object, e.g. in the code segment of the binary executable) :
int C$getA(A*thisptr) const { return thisptr->m_a; }
then imagine that the compiler is translating p->getA() into C$getA(p)
A virtual member function is generally implemented thru a vtable (virtual method table). An object with some virtual member functions (including destructor) has generally as its first (implicit) member field a pointer to such a table (generated elsewhere by the compiler). Your class A don't have any virtual method, but imagine if it had an additional virtual void print(std::ostream&); method, then your class A would have the same layout as
struct A$ {
struct A$virtualmethodtable* _vptr;
int m_a;
};
and the virtual table might be
struct A$virtualmethodtable {
void (*print$fun) (struct A$*, std::ostream*);
};
(so adding other virtual functions means simply adding slot inside that vtable);
and then a call like p->print(std::cout); would be translated almost like
p->_vptr.print$fun(p,&std::cout); ... In addition, the compiler would generate as constant tables various virtual method tables (one per class).
NB: things are more complex with multiple or virtual inheritance.
In both cases, member functions don't eat any additional space in the object. If it is non-virtual, it is just a plain function (in the code segment). If it is virtual, it shares a slot in the virtual method table.
NB. If you compile with a recent GCC (i.e. with g++) or with a Clang (so clang++) you could pass it e.g. the -fdump-tree-all flag: it will produce hundreds of dump files showing partly -in a dumped textual form- some internal representations of the compiler, which you could inspect with a pager (e.g. less) or a textual editor. You could also use MELT or look at the assembly code produced with g++ -S -fverbose-asm -O1 ....
All local non static variables and Non virtual functions are saved in code/text segment.
All static and global variables are saved in static data segment.
Class with virtual functions or inherited from a class with virtual functions will inserted a vptr pointer by compiler. Vptr points to a virtual function table which has number of functions slots. Each slot contains the function address which is stored in the code segment.
To understand this you need to learn about memory layout of a program. the code will be shared by the objects. and all objects will have their own copy of data.
I'll start by saying I understand that that only nonstatic member functions can be virtual, but this is what I want:
A base class defining an interface: so I can use base class pointers to access functions.
For memory management purposes (this is an embedded system with limited ram) I want the overriding functions to be statically allocated. I accept the consequence that with a static function, there will be constraints on how I can manipulate data in the function.
My current thinking is that I may keep a light overloading function by making it a wrapper for a function that actually is static.
Please forbear telling me I need to re-think my design. This is why I am asking the question. If you'd like to tell me I'm better off using c and using callbacks, please direct me to some reading material to explain the pitfalls of using an object oriented approach. Is there a object oriented pattern of design which meets the requirements I have enumerated?
Is there a object oriented pattern of design which meets the requirements I have enumerated?
Yes, plain old virtual functions. Your desire is "the overriding functions to be statically allocated." Virtual functions are statically allocated. That is, the code which implements the functions exists once, and only once, and is fixed at compile/link time. Depending upon your linker command, they are as likely to be stored in flash as any other function.
class I {
public:
virtual void doit() = 0;
virtual void undoit() = 0;
};
class A : public I {
public:
virtual void doit () {
// The code for this function is created statically and stored in the code segment
std::cout << "hello, ";
}
virtual void undoit () {
// ditto for this one
std::cout << "HELLO, ";
}
};
class B : public I {
public:
int i;
virtual void doit() {
// ditto for this one
std::cout << "world\n";
}
virtual void undoit() {
// yes, you got it.
std::cout << "WORLD\n";
}
};
int main () {
B b; // So, what is stored inside b?
// There are sizeof(int) bytes for "i",
// There are probably sizeof(void*) bytes for the vtable pointer.
// Note that the vtable pointer doesn't change size, regardless of how
// many virtual methods there are.
// sizeof(b) is probably 8 bytes or so.
}
For memory management purposes (this is an embedded system with
limited ram) I want the overriding functions to be statically
allocated.
All functions in C++ are always statically allocated. The only exception is if you manually download and utilize a JIT.
Static member functions are just plain functions (like non-member functions), that are inside the namespace of the class. That means you can treat them like plain functions, and the following solution should do:
class Interface
{
public:
void (*function) ();
};
class Implementation: public Interface
{
public:
Implementation()
{
function = impl_function;
}
private:
static void impl_function()
{
// do something
}
};
then
Implementation a;
Interface* b = &a;
b->function(); // will do something...
The problem with this approach is that you would be doing almost what the compiler does for you when you use virtual member functions, just better (needs less code, is less error-prone, and the pointer to the implementation functions are shared). The main difference is that using virtual your function would receive the (invisible) this parameter when called, and you would be able to access the member variables.
Thus, I would recommend to you to simply not do this, and use ordinary virtual methods.
The overhead with virtual functions is two-fold: besides the code for the actual implementations (which resides in the code segment, just like any other function you write), there is the virtual function table, and there are pointers to that table. The virtual function table is present once for each derived class, and its size depends on the number of virtual functions. Every object must carry a pointer to its virtual function table.
My point is, the per-object overhead of virtual functions is the same no matter how many virtual functions you have, or how much code they contain. So the way you arrange your virtual functions should have little impact on your memory consumtion, once you have decided that you want some degree of polymorphism.
Is there any way to force a compiler (well GCC specifically) to make a class compile to object oriented C? Specifically what I want to achieve is to write this:
class Foo {
public:
float x, y, z;
float bar();
int other();
...etc
};
Foo f;
float result = f.bar()
int obSize = sizeof(Foo);
Yet compile to exactly the same as:
Struct Foo { float x, y, z; };
float Foo_bar(Foo *this);
Foo f;
float result = Foo_bar(&f);
int obSize = sizeof(Foo);
My motivation is to increase readability, yet not suffer a memory penalty for each object of Foo. I'd imagine the class implementation normally obSize would be
obSize = sizeof(float)*3 + sizeof(void*)*number_of_class_methods
Mostly to use c++ classes in memory constrained microcontrollers. However I'd imagine if I got it to work I'd use it for network serialization as well (on same endian machines of course).
Your compiler actually does exactly that for you. It might even be able to optimize optimistically by putting the this pointer in a register instead of pushing it onto the stack (this is at least what MSVC does on Windows), which you wouldn't be able to do with standard C calling convention.
As for:
obSize = sizeof(float)*3 + sizeof(void*)*number_of_class_methods
It is plain false. Did you even try it ?
Even if you had virtual functions, only one pointer to a table of functions would be added to each object (one table per class). With no virtual functions, nothing is added to an object beyond its members (and no function table is generated).
void* represents a pointer to data, not a pointer to code (they need not have the same size)
There is no guarantee that the size of the equivalent C struct is 3 * sizeof(float).
C++ already does what you're talking about for non-polymorphic classes (classes without a virtual method).
Generally speaking, a C++ class will have the same size as a C struct, unless the class contains one or more virtual methods, in which case the overhead will be a single pointer (often called the vptr) for each class instance.
There will also be a single instance of a 'vtbl' that has a set of pointers for each virtual function - but that vtbl will be shared among all objects of that class type (ie., there's a single vtbl per-class type, and the various vptrs for objects of that class will point to the same vtbl instance).
In summary, if your class has no virtual methods, it will be no larger than the same C struct. This fits with the C++ philosophy of not paying for what you don't use.
However, note that non-static member functions in a C++ class do take an extra parameter (the this pointer) that isn't explicitly mentioned in the parameter list - that is essentially what you discuss in your question.
footnote: in C++ classes and structs are the same except for the minor difference of default member accessibility. In the above answer, when I use the term 'class', the behavior applies just as well to structs in C++. When I use the term 'struct' I'm talking about C structs.
Also note that if your classes use inheritance, the 'overhead' of that inheritance depends on the exact variety of inheritance. But as in the difference between polymorphic and non-polymorphic classes, whatever that cost might be, it's only brought in if you use it.
No, your imagination is wrong. Class methods take up no space at all in an object. Why not write a class, and take the sizeof. Then add a few more methods and print the sizeof again. You will see that it hasn't changed. Something like this
First program
class X
{
public:
int y;
void method1() {}
};
int main()
{
cout << sizeof(X) << '\n'; // prints 4
}
Second program
class X
{
public:
int y;
void method1() {}
void method2() {}
void method3() {}
void method4() {}
void method5() {}
void method6() {}
};
int main()
{
cout << sizeof(X) << '\n'; // also prints 4
}
Actually, I believe there is no specific memory penalty with using classes since member functions are stored once for every instance of the class. So your memory footprint would be more like 1*sizeof(void*)*number_of_class_methods + N*sizeof(float)*3 where you have N instances of Foo.
The only time you get an additional penalty is when using virtual functions in which case each object carries around a pointer to a vtable with it.
You need to test, but as far as i know a class instance does only store pointers to its methods if said methods are virtual; otherwise, a struct and a class will take roughly the same amount of memory (bar different alignment done by different compilers etc).