Why does adding virtual method increase class size in C++?

Why does adding virtual method increase class size in C++? - c++

#include <iostream>
using namespace std;
class a {
virtual int foo() {
return 0;
}
};
class b {
int foo() {
return 0;
}
};
int main() {
cout << sizeof(b) << endl;
cout << sizeof(a) << endl;
}
Output (with g++ 4.9, -O3):
1
8
I assume the increase in size is due to adding a vpointer. But I thought the compiler would see that a is not actually deriving or being derived from anything, hence there is no need to add the vpointer?

The vpointer is needed because the compiler cannot guarantee an external (e.g. shared) library does not use a derived type. The existence-of-derived-class resolution happens at runtime.

Run-time type information. Any polymorphic class creates extra meta-data in the program to make things like typeof and dynamic_cast work. This is in addition to the virtual function table.

Related

Creating a class member that is automatically calculated from other class members?

I'm an absolute newbee when it comes to programming and I'm trying to teach myself the basics by just solving some easy "problems" in C++.
I have searched the web for an exact answer to my question before posting it here and haven't found one so far, however that may be because of (1).
So, what I'm looking for is a way to declare a class member that gets automatically calculated from other members of the same class, so that the calculated class member can be used just like an explicitly defined class member would. For example imagine a struct called creature that has the properties/members creature.numberofhands, creature.fingersperhand and finally the property creature.totalfingers that automatically gets calculated from the above members.
Heres an example of the closest I got to what I wanted to achieve:
#include <iostream>
typedef struct creature {
int numberofhands;
int fingersperhand;
int totalfingers();
} creature;
int creature::totalfingers()
{
return numberofhands * fingersperhand;
};
int main()
{
creature human;
human.numberofhands = 2;
human.fingersperhand = 5;
printf("%d",human.totalfingers());
return(0);
}
What's really annoying me about this, is that I have to treat the calculated one DIFFERENTLY from the explicitly defined ones, i.e. I have to put "()" after it.
How can I change the code, so I can use: human.totalfingers without ever explicitly defining it?

The simplest option would be to use public member functions and make the actual properties hidden.
Something like this:
class Creature {
public:
Creature(int numhands, int fingersperhand) // constructor
: m_numhands{numhands}, m_fingersperhand{fingersperhand}
{ }
int fingersPerHand() const { return m_fingersperhand; }
int numberOfHands() const { return m_numhands; }
int totalFingers() const { return numberOfHands() * fingersPerHand(); }
private:
const int m_numhands;
const int m_fingersperhand;
};
The private member variables are an implementation detail. Users of the class just use the three public member functions to get the different number of fingers after construction and don't need to care that two of them are returning constant stored numbers and the third returns a calculated value - that's irrelevant to users.
An example of use:
#include <iostream>
int main()
{
Creature human{2, 5};
std::cout << "A human has "
<< human.totalFingers() << " fingers. "
<< human.fingersPerHand() << " on each of their "
<< human.numberOfHands() << " hands.\n";
return 0;
}
If - as per your comment - you don't want to use a constructor (although that's the safest way to ensure you don't forget to initialize a member), you can modify the class like this:
class CreatureV2 {
public:
int fingersPerHand() const { return m_fingersperhand; }
int numberOfHands() const { return m_numhands; }
int totalFingers() const { return numberOfHands() * fingersPerHand(); }
void setFingersPerHand(int num) { m_fingersperhand = num; }
void setNumberOfHands(int num) { m_numhands = num; }
private:
// Note: these are no longer `const` and I've given them default
// values matching a human, so if you do nothing you'll get
// human hands.
int m_numhands = 2;
int m_fingersperhand = 5;
};
Example of use of the modified class:
#include <iostream>
int main()
{
CreatureV2 human;
std::cout << "A human has "
<< human.totalFingers() << " fingers. "
<< human.fingersPerHand() << " on each of their "
<< human.numberOfHands() << " hands.\n";
CreatureV2 monster;
monster.setFingersPerHand(7);
monster.setNumberOfHands(5);
std::cout << "A monster has "
<< monster.totalFingers() << " fingers. "
<< monster.fingersPerHand() << " on each of their "
<< monster.numberOfHands() << " hands.\n";
CreatureV2 freak;
freak.setFingersPerHand(9);
// Note: I forgot to specify the number of hands, so a freak get
// the default 2.
std::cout << "A freak has "
<< freak.totalFingers() << " fingers. "
<< freak.fingersPerHand() << " on each of their "
<< freak.numberOfHands() << " hands.\n";
return 0;
}
Note: all of the above assumes you are using a modern C++14 compiler.

What you have described is one of the reasons why encapsulation and "member variables should be private" is the recommended way of doing things in C++.
If every variable is accessed through a function, then everything is consistent, and refactoring from a member variable to a computation is possible.
Some languages, like C# or D, have the concept of "properties", which provide a way around the issue, but C++ does not have such a construct.

For fun, the proxy way to avoid extra parenthesis, (but with some extra costs):
class RefMul
{
public:
RefMul(int& a, int& b) : a(a), b(b) {}
operator int() const { return a * b; }
private:
int& a;
int& b;
};
struct creature {
int numberofhands;
int fingersperhand;
RefMul totalfingers{numberofhands, fingersperhand};
};
Demo
Note: to use RefMul with printf, you have to cast to int:
printf("%d", int(human.totalfingers));
That cast would not be required if you use c++ way to print:
std::cout << human.totalfingers;

If you're after consistency, you can make your changes the other way around. Replace the two member variables with constant methods which simply return copies of the member variables. That way, the way you access data is consistent and you don't have to worry about some code changing the values of the member variables when it shouldn't.

Others have provided very good answers. If you are looking for consistency, probably the easiest way is to make use of member functions (as #Jesper Juhl has answered).
On the other hand, if you strictly want to use class members that are calculated automatically from other members, you can use properties. Properties (as are defined in C# and Groovy) are not a standard feature of C++ but there are ways to implement them in C++. This SO question has a very good overview of the ways that properties can be defined and used in C++. My favorite way of defining properties is taking advantage of Microsoft-specific extension for properties in Visual C++ (obviously, this approach is specific to Microsoft Visual C++). A documentation of properties in Visual C++ can be found in MSDN. Using properties in Visual C++, your code can be modified to:
struct creature {
int numberofhands; // use of public member variables are generally discouraged
int fingersperhand;
__declspec(property(get = get_totalfingers)) // Microsoft-specific
int totalfingers;
private:
int fingers;
int get_totalfingers()
{
return numberofhands * fingersperhand; // This is where the automatic calculation takes place.
}
};
This class can be used like this:
#include <iostream>
int main()
{
creature martian;
martian.numberofhands = 2;
martian.fingersperhand = 4; // Marvin the Martian had 4!
// This line will print 8
std::cout << "Total fingers: " << martian.totalfingers << std::endl;
return 0;
}
As I said earlier, properties are not a standard feature of C++ but there are ways to get them in C++ which either rely on smart tricks or using compiler-specific features. IMHO, using simple functions (as #Jesper Juhl described) is a better alternative.

Change vtable memory after constructor allocation?

This was a question asked to me in an interview...
Is it possible to change the vtable memory locations after it's
created via constructor? If yes, is it a good idea? And how to do that?
If not, why not?
As i don't have that in depth idea about C++, my guess is that, it's not possible to change vtable after it's created!
Can anyone explain?

The C++ standards do not tell us how dynamic dispatch must be implemented. But vtable is the most common way.
Generally, the first 8 bytes of the object is used to store the pointer to vtable, but only if the object has at least 1 virtual function (otherwise we can save this 8 bytes for something else). And it is not possible to change the records in the vtable during run-time.
But you have memset or memcpy like functions and can do whatever you want (change the vtable pointer).
Code sample:
#include <bits/stdc++.h>
class A {
public:
virtual void f() {
std::cout << "A::f()" << std::endl;
}
virtual void g() {
std::cout << "A::g()" << std::endl;
}
};
class B {
public:
virtual void f() {
std::cout << "B::f()" << std::endl;
}
virtual void g() {
std::cout << "B::g()" << std::endl;
}
};
int main() {
std::ios_base::sync_with_stdio(false);
std::cin.tie(nullptr);
A * p_a = new A();
B * p_b = new B();
p_a->f();
p_a->g();
p_b->f();
p_b->g();
size_t * vptr_a = reinterpret_cast<size_t *>(p_a);
size_t * vptr_b = reinterpret_cast<size_t *>(p_b);
std::swap(*vptr_a, *vptr_b);
p_a->f();
p_a->g();
p_b->f();
p_b->g();
return 0;
}
Output:
A::f()
A::g()
B::f()
B::g()
B::f()
B::g()
A::f()
A::g()
https://ideone.com/CEkkmN
Of course all these manipulations is the way to shoot yourself in the foot.

The proper response to that question is simple: This question can't be answered. The question talks about the "vtable memory locations" and then continues with "after it's created via constructor". This sentence doesn't make sense because "locations" is plural while "it" can only refer to a singular.
Now, if you have a question concerning the typical C++ implementations that use a vtable pointer, please feel free to ask. I'd also consider reading The Design and Evolution of C++, which contains a bunch of background infos for people that want to understand how C++ works and why.

When is the value of "this" shifted by an offset?

I was wondering whether assert( this != nullptr ); was a good idea in member functions and someone pointed out that it wouldn’t work if the value of this had been added an offset. In that case, instead of being 0, it would be something like 40, making the assert useless.
When does this happen though?

Multiple inheritance can cause an offset, skipping the extra v-table pointers in the object. The generic name is "this pointer adjustor thunking".
But you are helping too much. Null references are very common bugs, the operating system already has an assert built-in for you. Your program will stop with a segfault or access violation. The diagnostic you'll get from the debugger is always good enough to tell you that the object pointer is null, you'll see a very low address. Not just null, it works for MI cases as well.

this adjustment can happen only in classes that use multiple-inheritance. Here's a program that illustrates this:
#include <iostream>
using namespace std;
struct A {
int n;
void af() { cout << "this=" << this << endl; }
};
struct B {
int m;
void bf() { cout << "this=" << this << endl; }
};
struct C : A,B {
};
int main(int argc, char** argv) {
C* c = NULL;
c->af();
c->bf();
return 0;
}
When I run this program I get this output:
this=0
this=0x4
That is: your assert this != nullptr will not catch the invocation of c->bf() where c is nullptr because the this of the B sub-object inside the C object is shifted by four bytes (due to the A sub-object).
Let's try to illustrate the layout of a C object:
0: | n |
4: | m |
the numbers on the left-hand-side are offsets from the object's beginning. So, at offset 0 we have the A sub-object (with its data member n). at offset 4 we have the B sub-objects (with its data member m).
The this of the entire object, as well as the this of the A sub-object both point at offset 0. However, when we want to refer to the B sub-object (when invoking a method defined by B) the this value need to be adjusted such that it points at the beginning of the B sub-object. Hence the +4.

Note this is UB anyway.
Multiple inheritance can introduce an offset, depending on the implementation:
#include <iostream>
struct wup
{
int i;
void foo()
{
std::cout << (void*)this << std::endl;
}
};
struct dup
{
int j;
void bar()
{
std::cout << (void*)this << std::endl;
}
};
struct s : wup, dup
{
void foobar()
{
foo();
bar();
}
};
int main()
{
s* p = nullptr;
p->foobar();
}
Output on some version of clang++:
0
0x4
Live example.
Also note, as I pointed out in the comments to the OP, that this assert might not work for virtual function calls, as the vtable isn't initialized (if the compiler does a dynamic dispatch, i.e. doesn't optimize if it know the dynamic type of *p).

Here is a situation where it might happen:
struct A {
void f()
{
// this assert will probably not fail
assert(this!=nullptr);
}
};
struct B {
A a1;
A a2;
};
static void g(B *bp)
{
bp->a2.f(); // undefined behavior at this point, but many compilers will
// treat bp as a pointer to address zero and add sizeof(A) to
// the address and pass it as the this pointer to A::f().
}
int main(int,char**)
{
g(nullptr); // oops passed null!
}
This is undefined behavior for C++ in general, but with some compilers, it might have the
consistent behavior of the this pointer having some small non-zero address inside A::f().

Compilers typically implement multiple inheritance by storing the base objects sequentially in memory. If you had, e.g.:
struct bar {
int x;
int something();
};
struct baz {
int y;
int some_other_thing();
};
struct foo : public bar, public baz {};
The compiler will allocate foo and bar at the same address, and baz will be offset by sizeof(bar). So, under some implementation, it's possible that nullptr -> some_other_thing() results in a non-null this.
This example at Coliru demonstrates (assuming the result you get from the undefined behavior is the same one I did) the situation, and shows an assert(this != nullptr) failing to detect the case. (Credit to #DyP who I basically stole the example code from).

I think its not that bad a idea to put assert, for example atleast it can catch see below example
class Test{
public:
void DoSomething() {
std::cout << "Hello";
}
};
int main(int argc , char argv[]) {
Test* nullptr = 0;
nullptr->DoSomething();
}
The above example will run without error, If more complex becomes difficult to debug if that assert is absent.
I am trying to make a point that null this pointer can go unnoticed, and in complex situation becomes difficult to debug , I have faced this situation.

calling virtual functions through pointers with and without consulting the VM-table

I want to take the address of a member function of a c++ class, store it in a pointer, and call the virtual function later on.
I know some things about it, but do not now how to take the address of a certain implementation of a virtual function that is NOT the implementation of the most descendent class (the actual class of the object).
Here is some sample code:
#include <iostream>
using namespace std;
class ca
{
public:
virtual void vfunc() {cout << "a::vfunc ";}
void mfunc() {cout << "a::mfunc ";}
};
class cb : public ca
{
public:
virtual void vfunc() {cout << "b::vfunc ";}
};
extern "C" int main(int, char **)
{
void (ca:: *ptr_to_vfunc)() = &ca::vfunc;
cout << sizeof(ptr_to_vfunc) << " ";
cb b;
(b.*ptr_to_vfunc)();
ca a;
(a.*ptr_to_vfunc)();
void (ca:: *ptr_to_mfunc)() = &ca::mfunc;
cout << sizeof(ptr_to_mfunc) << " ";
(a.*ptr_to_mfunc)();
}
The output is:
12 b::vfunc a::vfunc 12 a::mfunc
I am working with win32-environment, and the size of member function pointers is 3 * 32-bits values! I did not specify an object when I took the address of the member function and yet, my call invokes the most descendant class' implementation of vfunc().
1) What is going on here? Why 12 bytes in stead of 4?
2) How can I take the address of ca::vfunc() and call it on b, like I normaly would do with b.ca::vfunc().

Ok: Its doing exactly what it it is supposed to do.
But to answer you questions:
1) What is going on here? Why 12 bytes in stead of 4?
Why not.
The standard does not specify a size.
I am not sure why you expect a normal pointer to be 4.
If the question is "why is a method pointer larger than a normal pointer?"
Because the implementation needs the extra space to hold information about the call.
2) How can I take the address of ca::vfunc() and call it on b, like I normaly would do with b.ca::vfunc().
You cant.

Virtual member functions and std::tr1::function: How does this work?

Here is a sample piece of code. Note that B is a subclass of A and both provide a unique print routine. Also notice in main that both bind calls are to &A::print, though in the latter case a reference to B is passed.
#include <iostream>
#include <tr1/functional>
struct A
{
virtual void print()
{
std::cerr << "A" << std::endl;
}
};
struct B : public A
{
virtual void print()
{
std::cerr << "B" << std::endl;
}
};
int main (int argc, char * const argv[])
{
typedef std::tr1::function<void ()> proc_t;
A a;
B b;
proc_t a_print = std::tr1::bind(&A::print, std::tr1::ref(a));
proc_t b_print = std::tr1::bind(&A::print, std::tr1::ref(b));
a_print();
b_print();
return 0;
}
Here is the output I see compiling with GCC 4.2:
A
B
I would consider this correct behavior, but I am at a loss to explain how it is working properly given that the std::tr1::functions were bound to &A::print in both cases. Can someone please enlighten me?
EDIT: Thanks for the answers. I am familiar with inheritance and polymorphic types. What I am interested in is what does &A::print mean? Is it an offset into a vtable, and that vtable changes based on the referred object (in this case, a or b?) From a more nuts-and-bolts perspective, how does this code behave correctly?

This works in the same manner as it would have worked with plain member function pointers. The following produces the same output:
int main ()
{
A a;
B b;
typedef void (A::*fp)();
fp p = &A::print;
(a.*p)(); // prints A
(b.*p)(); // prints B
}
It would have been surprising if boost/tr1/std::function did anything different since they presumably store these pointers to member functions under the hood. Oh, and of course no mention of these pointers is complete without a link to the Fast Delegates article.

Because print() is declared virtual, A is a polymorphic class. By binding to the print function pointer, you will be calling through an A pointer, much in the same way as:
A* ab = &b;
ab->print();
In the ->print call above, you would expect polymorphic behavior. Same it true in your code as well. And this is a Good Thing, if you ask me. At least, most of the time. :)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why does adding virtual method increase class size in C++? - c++

The vpointer is needed because the compiler cannot guarantee an external (e.g. shared) library does not use a derived type. The existence-of-derived-class resolution happens at runtime.

Run-time type information. Any polymorphic class creates extra meta-data in the program to make things like typeof and dynamic_cast work. This is in addition to the virtual function table.

Related

Creating a class member that is automatically calculated from other class members?

Change vtable memory after constructor allocation?

When is the value of "this" shifted by an offset?

calling virtual functions through pointers with and without consulting the VM-table

Virtual member functions and std::tr1::function: How does this work?

Categories

Resources