Can GCC compile classes to work as structs? - c++

Is there any way to force a compiler (well GCC specifically) to make a class compile to object oriented C? Specifically what I want to achieve is to write this:
class Foo {
public:
float x, y, z;
float bar();
int other();
...etc
};
Foo f;
float result = f.bar()
int obSize = sizeof(Foo);
Yet compile to exactly the same as:
Struct Foo { float x, y, z; };
float Foo_bar(Foo *this);
Foo f;
float result = Foo_bar(&f);
int obSize = sizeof(Foo);
My motivation is to increase readability, yet not suffer a memory penalty for each object of Foo. I'd imagine the class implementation normally obSize would be
obSize = sizeof(float)*3 + sizeof(void*)*number_of_class_methods
Mostly to use c++ classes in memory constrained microcontrollers. However I'd imagine if I got it to work I'd use it for network serialization as well (on same endian machines of course).

Your compiler actually does exactly that for you. It might even be able to optimize optimistically by putting the this pointer in a register instead of pushing it onto the stack (this is at least what MSVC does on Windows), which you wouldn't be able to do with standard C calling convention.
As for:
obSize = sizeof(float)*3 + sizeof(void*)*number_of_class_methods
It is plain false. Did you even try it ?
Even if you had virtual functions, only one pointer to a table of functions would be added to each object (one table per class). With no virtual functions, nothing is added to an object beyond its members (and no function table is generated).
void* represents a pointer to data, not a pointer to code (they need not have the same size)
There is no guarantee that the size of the equivalent C struct is 3 * sizeof(float).

C++ already does what you're talking about for non-polymorphic classes (classes without a virtual method).
Generally speaking, a C++ class will have the same size as a C struct, unless the class contains one or more virtual methods, in which case the overhead will be a single pointer (often called the vptr) for each class instance.
There will also be a single instance of a 'vtbl' that has a set of pointers for each virtual function - but that vtbl will be shared among all objects of that class type (ie., there's a single vtbl per-class type, and the various vptrs for objects of that class will point to the same vtbl instance).
In summary, if your class has no virtual methods, it will be no larger than the same C struct. This fits with the C++ philosophy of not paying for what you don't use.
However, note that non-static member functions in a C++ class do take an extra parameter (the this pointer) that isn't explicitly mentioned in the parameter list - that is essentially what you discuss in your question.
footnote: in C++ classes and structs are the same except for the minor difference of default member accessibility. In the above answer, when I use the term 'class', the behavior applies just as well to structs in C++. When I use the term 'struct' I'm talking about C structs.
Also note that if your classes use inheritance, the 'overhead' of that inheritance depends on the exact variety of inheritance. But as in the difference between polymorphic and non-polymorphic classes, whatever that cost might be, it's only brought in if you use it.

No, your imagination is wrong. Class methods take up no space at all in an object. Why not write a class, and take the sizeof. Then add a few more methods and print the sizeof again. You will see that it hasn't changed. Something like this
First program
class X
{
public:
int y;
void method1() {}
};
int main()
{
cout << sizeof(X) << '\n'; // prints 4
}
Second program
class X
{
public:
int y;
void method1() {}
void method2() {}
void method3() {}
void method4() {}
void method5() {}
void method6() {}
};
int main()
{
cout << sizeof(X) << '\n'; // also prints 4
}

Actually, I believe there is no specific memory penalty with using classes since member functions are stored once for every instance of the class. So your memory footprint would be more like 1*sizeof(void*)*number_of_class_methods + N*sizeof(float)*3 where you have N instances of Foo.
The only time you get an additional penalty is when using virtual functions in which case each object carries around a pointer to a vtable with it.

You need to test, but as far as i know a class instance does only store pointers to its methods if said methods are virtual; otherwise, a struct and a class will take roughly the same amount of memory (bar different alignment done by different compilers etc).

Related

RTTI and virtual functions in c++ . Is the implementation approach of gcc necessary?

While trying to understand the inner workings of virtual function and RTTI, I observed the subsequent fact by examining the gcc compiler:
When structs or classes have a virtual function than the space occupied by them get's expanded by preciding them with a pointer that determines their type.
More over and worse, when multiple inheritance is active, each substructure adds another pointer. So e.g. a structure as:
struct C: public A, public B{...}
creates a structure with the data of A and B plus two pointers.
That is memory and efficiency consuming.
The question is if that is needed.
I don't have experience in working with those arguments but the logic thougths that I would like to ask if are buggy are as follows:
a) when we declare a variable of a certain struct type we know it's type. So when we have to pass that variable by value to another function we know it's type and we may pass it as a plain old structure along with an extra value indicating it's type ( I mean the compiler should take care of that).
b) When we pass a struct by reference or by pointer we know again the starting type and we may pass a pointer along with an extra value indicating the type.
In order to assign a struct passed by value to a narrower type we have just to take the necessary subpart (offsets calculation).
In order to change to a narrower pointer we have to adjust the pointer and change the extra value indicating the type accordingly.
So along that line of thoughs, we need to have the extra type indication only when passing called arguments between functions on the stack and it isn't needed to save them in each struct.
What am am I missing ?
The only case that I see would need the type information stored along with the data is when we use a Union and we need to save any kind of a certain specific list of structures but tha is any case resolved by other means.
So the question is if dynamic type information is needed besides when passing arguments between functions, and if and where eventually that is mandated by the standard.
What am am I missing ?
rtti and dynamic_cast need to be able to deduce the layout of the object from any one of its interfaces.
How will you do that unless each interface encapsulates a pointer to this information?
example of RTTI:
struct A { virtual ~A() = default; }; // virtual base class
struct B : A {}; // derived polymorphic class
A* p = new B();
std::cout << typeid(*p).name() << std::endl;
Exercise for reader:
Which class's name gets printed?
Why?
I think, following code will explain it all:
// in header.h
struct Base {
virtual void foo() = 0;
};
// in impl.cpp
void work(Base* b) {
b->foo();
}
Assume the code above gets compiled into a C++ library. Never touched afterwards.
Now, some time later, we write following code:
#include <header.h>
struct D : Base {
void foo() { std::cout << "Foo!\n"; }
};
void my() {
D* d = new D;
work(d);
}
As you see, we can't possibly pass any type to work() - when it was compiled, D didn't even exist! The elegance of virtual functions through pointers to virtual function table - this is what those pointers are - is that above design is allowed.

do operator methods occupy memory in c++ objects?

Suppose I have some simple classes/structs without anything but data and a select few operators. If I understand, a basic struct with only data in C++, just like C, occupies as much memory as the members. For example,
struct SomeStruct { float data; }
sizeof(SomeStruct) == sizeof(float); // this should evaluate to true
What I'm wondering is if adding operators to the class will make the object larger in memory. For example
struct SomeStruct
{
public:
SomeStruct & operator=(const float f) { data = f; return this; }
private:
float data;
}
will it still be true that sizeof(SomeStruct) == sizeof(float) evaluates to true? Are there any operators/methods which will not increase the size of the objects in memory?
The structure may not necessarily be only as large as its members (consider padding and alignment), but you are basically correct, in that:
Functions are not data, and are not "stored" inside the object type.
That said, watch out for the addition of virtual table pointers in the case where you add a virtual function to your type. This is a one-time size increase for the type, and does not re-apply when you add more virtual functions.
What I'm wondering is if adding operators to the class will make the object larger in memory.
The answer is "it depends".
If the class wasn't polymorphic prior to adding the function and this new function keeps the class non-polymorphic, then adding this non-polymorphic function does nothing to the size of your class instances.
On the other hand, if adding this new function does make your class polymorphic, this addition will make instances of your class bigger. Most C++ implementations use a virtual table, or vtable for short. Each instance of a polymorphic class contains a pointer to the vtable for that class. Instances of non-polymorphic classes don't need and thus don't contain a vtable pointer.
Finally, adding yet another virtual function to a class that is already polymorphic does not make the class instances bigger. This addition does makes the vtable for that class bigger, but the vtable itself isn't a part of the instance. A vtable pointer is a part of the instance, and that pointer is already a part of the class layout because the class is already polymorphic.
When I was learning about C++ and OOP, I read somewhere (some bad source) that objects in C++ are essentially the same thing as C structs with function pointers inside of them.
They may be like that functionally, but if they were really implemented like that, it would have been a huge waste of space since all object instances would have to store the same pointers.
Method code is stored in one central location and C++ just makes it conveniently look like as if each instance had its methods inside of it.
(Operators are essentially functions with different syntax).
Methods and operators defined inside classes do not increase the size of instantiated objects. You can test it out for yourself:
#include <iostream>
using namespace std;
struct A {
int a;
};
struct B {
int a;
//SOME RANDOM METHODS AND OPERATORS
B() : a(1) {cout<<"I'm the constructor and I set 'a' to 1"<<endl;}
void some_method() const { for(int i=0;i<40;i++) cout<<"loop";}
B operator+=(const B& b){
a+=b.a;
return *this;
}
size_t my_size() const { return sizeof(*this);}
};
int main(){
cout<<sizeof(A)<<endl;
cout<<B().my_size()<<endl;
}
Output on a 64 bit system:
4
I'm the constructor and I set 'a' to 1
4
==> No change in size.

asm.js - How should function pointers be implemented

Note: This question is purely about asm.js not about C++ nor any other programming language.
As the title already says:
How should a function pointer be implemented in a efficient way?
I couldn't find anything on the web, so I figured asking it here.
Edit:
I would like to implement virtual functions in the compiler I'm working on.
In C++ I would do something like this to generate a vtable:
#include <iostream>
class Base {
public:
virtual void doSomething() = 0;
};
class Derived : public Base {
public:
void doSomething() {
std::cout << "I'm doing something..." << std::endl;
}
};
int main()
{
Base* instance = new Derived();
instance->doSomething();
return 0;
}
To be more precise; how can I generate a vtable in asm.js without the need of plain JavaScript?
In any case, I would like the "near native" capabilities of asm.js while using function pointers.
The solution may be suitable for computer generated code only.
Looking over how asm.js works, I believe your best bet would be to use the method the original CFront compiler used: compile the virtual methods down to functions that take a this pointer, and use thunks to correct the this pointer before passing it. I'll go through it step by step:
No Inheritance
Reduce methods to special functions:
void ExampleObject::foo( void );
would be transformed into
void exampleobject_foo( ExampleObject* this );
This works fine for non-inheritance based objects.
Single Inheritance
We can easily add support for arbitrary large amount of single inheritance through a simple trick: always store the object in memory base first:
class A : public B
would become, in memory:
[[ B ] A ]
Getting closer!
Multiple Inheritance
Now, multiple inheritance makes this much harder to work with
class A : public B, public C
It's impossible for both B and C to be at the start of A; they simply cannot co-exist. There are two choices:
Store an explicit offset (known as delta) for each call to base.
Do not allow calls through A to B or C
The second choice is much preferable for a variety of reasons; if you are calling a base class member function, it's rare you would want to do it through a derived class. Instead, you could simply go C::conlyfunc, which could then do the adjustment to your pointer for you at no cost. Allowing A::conlyfunc removes important information that the compiler could have used, at very little benefit.
The first choice is used in C++; all multiple inheritance objects call a thunk before each call to a base class, which adjusts the this pointer to point to the subobject inside it. In a simple example:
class ExampleBaseClass
{
void foo( void );
}
class ExampleDerivedClass : public ExampleBaseClass, private IrrelevantBaseClass
{
void bar( void );
}
would then become
void examplebaseclass_foo( ExampleBaseClass* this );
void examplederivedclass_bar( ExampleDerivedClass* this);
void examplederivedclass_thunk_foo( ExampleDerivedClass* this)
{
examplebaseclass_foo( this + delta );
}
This could be inlined in many situations, so it's not too big of overhead. However, if you could never refer to ExampleBaseClass::foo as ExampleDerivedClass::foo, these thunks wouldn't be needed, as the delta would be easily discernible from the call itself.
Virtual functions
Virtual functions adds a whole new layer of complexity. In the multiple inheritance example, the thunks had fixed addresses to call; we were just adjusting the this before passing it to an already known function. With virtual functions, the function we're calling is unknown; we could be overridden by a derived object we have no possibility of knowing about at compile time, due to it being in another translation unit or a library, etc.
This means we need some form of dynamic dispatch for each object that has a virtually overridable function; many methods are possible, but C++ implementations tend to use a simple array of function pointers, or a vtable. To each object that has virtual functions, we add a point to an array as a hidden member, usually at the front:
class A
{
hidden:
void* vtable;
public:
virtual void foo( void );
}
We add thunk functions which redirect to the vtable
void a_foo( A* this )
{
int vindex = 0;
this->vtable[vindex](this);
}
The vtable is then populated with pointers to the functions we actually want to call:
vtable[0] = &A::foo_default; // our baseclass implimentation of foo
In a derived class, if we wish to override this virtual function, all we need to do is change the vtable in our own object, to point to the new function, and it will override in the base class as well:
class B: public A
{
virtual void foo( void );
}
will then do this in the constructor:
((A*)this)->vtable[0] = &B::foo;
Finally, we have support for all forms of inheritance!
Almost.
Virtual Inheritance
There is one final caveat with this implementation: if you continue to allow Derived::foo to be used when what you really mean is Base::foo, you get the diamond problem:
class A : public B, public C;
class B : public D;
class C : public D;
A::DFunc(); // Which D?
This problem can also occur when you use base classes as stateful classes, or when you put function that should be has-a rather than is-a; generally, it's a sign of a need for a restructure. But not always.
In C++, this has a solution that is not very elegant, but works:
class A : public B, public C;
class B : virtual D;
class C : virtual D;
This requires those who implement such classes and hierarchies to think ahead and intentionally make their classes a little slower, to support a possible future usage. But it does solve the problem.
How can we implement this solution?
[ [ D ] [ B ] [ Dptr ] [ C ] [ Dptr ] A ]
Rather than use the base class directly, as in normal inheritance, with virtual inheritance we push all usages of D through a pointer, adding an indirection, while stomping the multiple instantiations into a single one. Notice that both B and C have their own pointer, that both point to the same D; this is because B and C don't know if they are free floating copies or bound in derived objects. The same calls need to be used for both, or virtual functions won't work as expected.
Summary
Transform method calls into function calls with a special this parameter in base classes
Structure objects in memory so single inheritance is no different from no inheritance
Add thunks to adjust this pointers then call base classes for multiple inheritance
Add vtables to classes with virtual methods, and make all calls to methods go through vtable to method (thunk -> vtable -> method)
Deal with virtual inheritance through a pointer-to-baseobject rather than derive object calls
All of this is straightforward in js.asm.
Tim,
I'm by no means an asm.js expert but your question intrigues me. It goes to the hart of object oriented language design. It also seems ironic that you are recreating machine-level problems in the javascript domain.
The solution to your question it seems to me is that you will need to setup an accounting of defined types and functions. In Java this is typically done by decorating bytecode with identifiers that represent the correct class->function mapping of any given object. If you use a Int32 identifier for each class that is defined and an additionaly Int32 identifier for each function defined you can then store these in the object representations on the heap. Your vtable is then no more than the mapping of these combination to specific functions.
I hope this helps you.
I am not very familiar with the exact syntax of asm.js, but here is how I implemented a vtable in a compiler of mine (for x86):
Each object are derived from a struct like this:
struct Object {
VTable *vtable;
};
Then the other types I use will look something like this in c++-syntax:
struct MyInt : Vtable {
int value;
};
which is (in this case) equivalent to:
struct MyInt {
VTable *vtable;
int value;
};
So the final layout of the objects are that at offset 0, I know I have a pointer to a vtable. The vtable I use is simply an array of function pointers, again in C-syntax the type VTable could be defined as follows:
typedef Function *VTable;
Where in C i would use void * instead of Function *, since the actual types of the function will vary. What is left for the compiler to do is:
1: For each type containing virtual functions, create a global vtable and populate it with function pointers to the overridden functions.
2: When an object is created, set the vtable member of the object (at offset 0) to point to the global vtable.
Then when you want to call virtual functions you can do something like this:
(*myObject->vtable[1])(1);
to call the function your compiler has assigned the ID 1 in the vtable (methodB in the example below).
A final example: Let's say we have the following two classes:
class A {
public:
virtual int methodA(int) { ... }
virtual int methodB(int) { ... }
virtual int methodC(int) { ... }
};
class B : public A {
public:
virtual int methodA(int) { ... }
virtual int methodB(int) { ... }
};
The VTable for class A and B can look like this:
A: B:
0: &A::methodA 0: &B::methodA
1: &A::methodB 1: &B::methodB
2: &A::methodC 2: &A::methodC
By using this logic, we know that when we are calling methodB on any type derived from A, we shall call whatever function is located at index 1 in the vtable of that object.
Of course, this solution do not work right away if you want to allow for multiple inheritance, but I am fairly sure it can be extended to do so. After some debugging with Visual Studio 2008, it seems like this is more or less how the vtables are implemented there (of course there it is extended to handle multiple inheritance, I have not tried to figure that out yet).
I hope you get some ideas that can be applied in asm.js at least. Like I said, I do not know exactly how asm.js works, but I have managed to implement this system in x86 assembly and I do not see any issues with implementing it in JavaScript either, so I hope it can be used in asm.js as well.

Static function overloading?

I'll start by saying I understand that that only nonstatic member functions can be virtual, but this is what I want:
A base class defining an interface: so I can use base class pointers to access functions.
For memory management purposes (this is an embedded system with limited ram) I want the overriding functions to be statically allocated. I accept the consequence that with a static function, there will be constraints on how I can manipulate data in the function.
My current thinking is that I may keep a light overloading function by making it a wrapper for a function that actually is static.
Please forbear telling me I need to re-think my design. This is why I am asking the question. If you'd like to tell me I'm better off using c and using callbacks, please direct me to some reading material to explain the pitfalls of using an object oriented approach. Is there a object oriented pattern of design which meets the requirements I have enumerated?
Is there a object oriented pattern of design which meets the requirements I have enumerated?
Yes, plain old virtual functions. Your desire is "the overriding functions to be statically allocated." Virtual functions are statically allocated. That is, the code which implements the functions exists once, and only once, and is fixed at compile/link time. Depending upon your linker command, they are as likely to be stored in flash as any other function.
class I {
public:
virtual void doit() = 0;
virtual void undoit() = 0;
};
class A : public I {
public:
virtual void doit () {
// The code for this function is created statically and stored in the code segment
std::cout << "hello, ";
}
virtual void undoit () {
// ditto for this one
std::cout << "HELLO, ";
}
};
class B : public I {
public:
int i;
virtual void doit() {
// ditto for this one
std::cout << "world\n";
}
virtual void undoit() {
// yes, you got it.
std::cout << "WORLD\n";
}
};
int main () {
B b; // So, what is stored inside b?
// There are sizeof(int) bytes for "i",
// There are probably sizeof(void*) bytes for the vtable pointer.
// Note that the vtable pointer doesn't change size, regardless of how
// many virtual methods there are.
// sizeof(b) is probably 8 bytes or so.
}
For memory management purposes (this is an embedded system with
limited ram) I want the overriding functions to be statically
allocated.
All functions in C++ are always statically allocated. The only exception is if you manually download and utilize a JIT.
Static member functions are just plain functions (like non-member functions), that are inside the namespace of the class. That means you can treat them like plain functions, and the following solution should do:
class Interface
{
public:
void (*function) ();
};
class Implementation: public Interface
{
public:
Implementation()
{
function = impl_function;
}
private:
static void impl_function()
{
// do something
}
};
then
Implementation a;
Interface* b = &a;
b->function(); // will do something...
The problem with this approach is that you would be doing almost what the compiler does for you when you use virtual member functions, just better (needs less code, is less error-prone, and the pointer to the implementation functions are shared). The main difference is that using virtual your function would receive the (invisible) this parameter when called, and you would be able to access the member variables.
Thus, I would recommend to you to simply not do this, and use ordinary virtual methods.
The overhead with virtual functions is two-fold: besides the code for the actual implementations (which resides in the code segment, just like any other function you write), there is the virtual function table, and there are pointers to that table. The virtual function table is present once for each derived class, and its size depends on the number of virtual functions. Every object must carry a pointer to its virtual function table.
My point is, the per-object overhead of virtual functions is the same no matter how many virtual functions you have, or how much code they contain. So the way you arrange your virtual functions should have little impact on your memory consumtion, once you have decided that you want some degree of polymorphism.

C++ Derived polymorphic class - does it contain an entire instance of Base, including a vptr?

Say we have
Class A
{
public:
int _i;
virtual int getI();
};
class B : public A
{
public:
int _j;
virtual int getI();
};
So assuming that the size of a class in memory is the sum of its members (i.e. ignoring padding or whatever might actually happen), what is the size of a B instance? Is it sizeof(A) + sizeof(int) + sizeof(vptr)? Or does a B instance not hold an A vptr in it's personal A instance, so that sizeof(b) would be sizeof(int) + sizeof(int) + sizeof(vptr)?
It's whatever the implementation needs to make the code work. All you
can say is that it is at least 2 * sizeof(int), because objects of
type B contain two ints (and possibly other things). In a typical
implementation, A and B will share a vptr, and the total size will
be just one pointer more than the two ints (modulo padding for
alignment, but on most implementations, I don't think that there will be
any). But that's just a typical implementation; you can't count on it.
Any talk of a vtable is going to be specific to a certain implementation, since even the existance of a vtable isn't specified by the C++ standard - it's an implementation detail.
Generally an object will only have one pointer to a vtable, and that vtable will be shared among all objects of that type. The derived class will contain pointers in the table for each virtual function of the base classes plus each new virtual function that it didn't inherit, but again this is a static table and it is not part of the object.
To actually answer the question, the most likely outcome is sizeof(B) == sizeof(A::_i) + sizeof(B::_j) + sizeof(vptr).
Why would you want to know? If you are going to use that in your program, I'd try to look for a more portable way than calculate it and either add a virtual function that returns the size, or perhaps use the runtime type infomration to get the correct type and then return the size.
(if you vote up or add comments I'll start decorating the answer)
In addition to what James Kanze said, it is perhaps worth mentioning that (on a typical implementation) the B's virtual table will contain A's virtual table at its beginning.
For example:
class A {
virtual void x();
virtual void y();
};
class B : A {
virtual void y();
virtual void z();
};
A's virtual table:
A:x()
A:y()
B's virtual table:
A:x()
B:y()
B:z()
That's why an instance of B can get away with just one virtual table pointer.
BTW, multiple inheritance can complicate things quite a bit and introduce multiple virtual table pointers per object.
Always keep in mind that all this is an implementation detail and you should never write code that depends on it.