Find upper bound of vtable size at runtime - c++

Virtual table of c++ class depends on the number of virtual functions defined.
Do you have any thoughts how to get an upper bound of the v-table size at runtime?
Say I have a pointer to a object, I know its public virtual functions from the header file, but I'm not sure how many protected/private virtual functions there are.
One idea in my mind is to sequentially read down along the vtable, until:
I have access-violation, then this can be an upper bound of the vtable size.
It is NULL (But NULL can also be a pure virtual function).
Edit:
Previously I have a question here: C++ COM Object Hotpatching?
I haven't get satisfactory answer. Thus I think of a way my self, which needs to hack the vtable pointer.
What I want to do is to add a variable to the c++ object during runtime. While the only thing I know about the object is that it has a vtable pointer. In order to add that field, I plan to point the vtable pointer to another place where I store a copy of its vtable. And in the upstream of this new vtable I store my added variable. That's the only solution I can think of.
The wrapping solution is not safe, coz for COM Objects, problem will arise if the object has a lot of interfaces and queryInterface is called.
And the idea to keep the new variable in a map will involve reading of the map everytime the variable is accessed.

Related

Enforcing a vftable entry in windbg "x /2" results, what to consider?

(This is quite a large question about software design. In case it's not suited for StackOverflow I'm willing to copy it to the Software-Engineering community)
I'm working with heap_stat, a script, which investigates dumps. This script is based on the idea that, for any object which has a virtual function, the vftable field is always the first one (allowing to find the memory address of the class of the object).
In my applications there are some objects, having vftable entries (typically every STL object has it), but there are also quite some objects who don't.
In order to force the presence of a vftable field, I've done following test:
Create a nonsense class, having a virtual function, and let my class inherit from this nonsense class:
class NONSENSE {
virtual int nonsense() { return 0; }
};
class Own_Class : public NONSENSE, ...
This, as expected, created a vftable entry in the symbols, which I could find (using Windbg's x /2 *!Own_Class*vftable* command):
00000000`012da1e0 Own_Application!Own_Class::`vftable'
I also saw a difference in memory usage:
sizeof(an normal Own_Class object) = 2928
sizeof(inherited Own_Class object) = 2936
=> 8 bytes have been added for this object.
There's a catch: apparently quite some objects are defined as:
class ATL_NO_VTABLE Own_Class
This ATL_NO_VTABLE blocks the creation of the vftable entry, which means the following (ATL_NO_VTABLE equals __declspec(novtable)):
// __declspec(novtable) is used on a class declaration to prevent the vtable
// pointer from being initialized in the constructor and destructor for the
// class. This has many benefits because the linker can now eliminate the
// vtable and all the functions pointed to by the vtable. Also, the actual
// constructor and destructor code are now smaller.
In my opinion, this means that the vftable does not get created, because of which object methods get called more directly, having an impact on the speed of the method execution and stack handling. Allowing the vftable to be created has following impact:
Not to be taken into account:
There is one more call on the stack, this only has impact in case of systems which are already at the limit of their memory usage. (I have no idea how the linker points to a particular method)
The CPU usage increase will be too small to be seen.
The speed decrease will be too small to be seen.
To be taken into account:
As mentioned before, the memory usage of the application increases by 8 bytes per object. When a regular object has a size of some 1000 bytes, this means a memory usage increase of ±1%, but for objects with a memory size of less than 80 bytes, this might cause a memory usage increase of +10%.
Now I have following questions:
Is my analysis on the impact correct?
Is there a better way to force the creation of the vftable field, having less impact?
Did I miss anything?
Thanks in advance
Is my analysis on the impact correct?
No. __declspec(novtable) omits generation of vtable itself for a given class, the pointer to vtable would still exist, so sizeof will not change.
__declspec(novtable) is meant to be used for base classes, that have derived classes. So that constructor of derived class will set vtable pointer to derived vtable, and base vtable is not needed.
So, this optimization eliminates one pointer assignment (in generated part of constructor code), and a bit of space for vtable itself. Not very much useful for your goal to have per-object optimization, as it only does small per-class optimization.
It will work if you don't create base instances on their own, and don't call virtual method in constructor/destructor.
Omission of virtual function calls by making them non-virtual is completely separate story. It is called devirtualization. When compiler can be sure instance of which class is used, it replaces virtual calls with non-virtual ones.
__declspec(novtable) cannot help devirtualization anyhow. final / sealed keywords may help devirtualization, as they say there's no further derived classes/methods.
Regarding assumption that vtable pointer is the first member, this may be wrong. vtable pointer will be not first if your base classes don't have vtable, but have some data member. Also there may be more than one vtable pointer.
To analyze structures in dump programmatically, I would recommend using proper API. There are two APIs: DIA SDK and dbghelp functions. They are similar, but first one is object-based (COM) and second is just flat API, so the first may be easier to use.
As approach with heap_stat script is inherently limited, I would recommend for heap analysis use UMDH instead, which does not rely on vtable at all, and shows all kinds of objects
In the meantime, I've found a terribly easy way to force vftable' entries for every class: just declare every destructor as virtual.
In order to find all destructors, who are not virtual yet, I've launched following command in my Ubuntu app within my development directory:
find ./ -name "*.h" -exec fgrep "~" {} /dev/null \; | grep -v "virtual"
After having declared all destructors as virtual, I'm planning to do some performance testing (I believe that declaring a method as virtual might have an impact on the speed, as the method declaration has been changed, especially for a server application with heavy load), I'll keep this post up to date.

virtual table and _vptr storage scheme

Can someone explains how this virtual table for the different class is stored in memory? When we call a function using pointer how do they make a call to function using address location? Can we get these virtual table memory allocation size using a class pointer? I want to see how many memory blocks is used by a virtual table for a class. How can I see it?
class Base
{
public:
FunctionPointer *__vptr;
virtual void function1() {};
virtual void function2() {};
};
class D1: public Base
{
public:
virtual void function1() {};
};
class D2: public Base
{
public:
virtual void function2() {};
};
int main()
{
D1 d1;
Base *dPtr = &d1;
dPtr->function1();
}
Thanks! in advance
The first point to keep in mind is a disclaimer: none of this is actually guaranteed by the standard. The standard says what the code needs to look like and how it should work, but doesn't actually specify exactly how the compiler needs to make that happen.
That said, essentially all C++ compilers work quite similarly in this respect.
So, let's start with non-virtual functions. They come in two classes: static and non-static.
The simpler of the two are static member functions. A static member function is almost like a global function that's a friend of the class, except that it also needs the class`s name as a prefix to the function name.
Non-static member functions are a little more complex. They're still normal functions that are called directly--but they're passed a hidden pointer to the instance of the object on which they were called. Inside the function, you can use the keyword this to refer to that instance data. So, when you call something like a.func(b);, the code that's generated is pretty similar to code you'd get for func(a, b);
Now let's consider virtual functions. Here's where we get into vtables and vtable pointers. We have enough indirection going on that it's probably best to draw some diagrams to see how it's all laid out. Here's pretty much the simplest case: one instance of one class with two virtual functions:
So, the object contains its data and a pointer to the vtable. The vtable contains a pointer to each virtual function defined by that class. It may not be immediately apparent, however, why we need so much indirection. To understand that, let's look at the next (ever so slightly) more complex case: two instances of that class:
Note how each instance of the class has its own data, but they both share the same vtable and the same code--and if we had more instances, they'd still all share the one vtable among all the instances of the same class.
Now, let's consider derivation/inheritance. As an example, let's rename our existing class to "Base", and add a derived class. Since I'm feeling imaginative, I'll name it "Derived". As above, the base class defines two virtual functions. The derived class overrides one (but not the other) of those:
Of course, we can combine the two, having multiple instances of each of the base and/or derived class:
Now let's delve into that in a little more detail. The interesting thing about derivation is that we can pass a pointer/reference to an object of the derived class to a function written to receive a pointer/reference to the base class, and it still works--but if you invoke a virtual function, you get the version for the actual class, not the base class. So, how does that work? How can we treat an instance of the derived class as if it were an instance of the base class, and still have it work? To do it, each derived object has a "base class subobject". For example, lets consider code like this:
struct simple_base {
int a;
};
struct simple_derived : public simple_base {
int b;
};
In this case, when you create an instance of simple_derived, you get an object containing two ints: a and b. The a (base class part) is at the beginning of the object in memory, and the b (derived class part) follows that. So, if you pass the address of the object to a function expecting an instance of the base class, it uses on the part(s) that exist in the base class, which the compiler places at the same offsets in the object as they'd be in an object of the base class, so the function can manipulate them without even knowing that it's dealing with an object of the derived class. Likewise, if you invoke a virtual function all it needs to know is the location of the vtable pointer. As far as it cares, something like Base::func1 basically just means it follows the vtable pointer, then uses a pointer to a function at some specified offset from there (e.g., the fourth function pointer).
At least for now, I'm going to ignore multiple inheritance. It adds quite a bit of complexity to the picture (especially when virtual inheritance gets involved) and you haven't mentioned it at all, so I doubt you really care.
As to accessing any of this, or using in any way other than simply calling virtual functions: you may be able to come up with something for a specific compiler--but don't expect it to be portable at all. Although things like debuggers often need to look at such stuff, the code involved tends to be quite fragile and compiler-specific.
The virtual table is supposed to be shared between instances of a class. More precisely, it lives at the "class" level, rather than the instance level. Each instance has the overhead of actually having a pointer to the virtual table, if in it's hierarchy there are virtual functions and classes.
The table itself is at least the size necessary to hold a pointer for each virtual function. Other than that, it is an implementation detail how it's actually defined. Check here for a SO question with more details about this.
First of all, the following answer contain almost everything you want to know regarding virtual tables:
https://stackoverflow.com/a/16097013/8908931
If you are looking for something a little more specific (with the regular disclaimer that this might change between platforms, compilers, and CPU architectures):
When needed, a virtual table is being created for a class. The class will have only one instance of the virtual table, and each object of the class will have a pointer which will point to the memory location of this virtual table. The virtual table itself can be thought of as a simple array of pointers.
When you assigned the derived pointer to the base pointer, it also contain the pointer to the virtual table. This mean that the base pointer points to the virtual table of the derived class. The compiler will direct this call to an offset into the virtual table, which will contain the actual address of the function from the derived class.
Not really. Usually at the start of an object, there is a pointer to the virtual table itself. But this will not help you too much, as it is just an array of pointers, with no real indication of its size.
Making a very long answer short: For an exact size you can find this information in the executable (or in segments loaded from it to the memory). With enough knowledge of how the virtual table works, you can get a pretty accurate estimation, given you know the code, the compiler, and the target architecture.
For the exact size, you can find this information in either the executable, or in segments in the memory which are being loaded from the executable. An executable is usually an ELF file, this kind of files, contain information which is needed to run a program. A part of this information is symbols for various kinds of language constructs such as variables, functions and virtual tables. For each symbol, it contains the size it takes in memory. So button line, you will need the symbol name of the virtual table, and enough knowledge in ELF in order to extract what you want.
The answer that Jerry Coffin gave is excellent in explaining how virtual function pointers work to achieve runtime polymorphism in C++. However, I believe that it is lacking in answering where in memory the vtable is stored. As others have pointed out this is not dictated by the standard.
However, there is an excellent blog post(s) by Martin Kysel that goes into great detail about where virtual tables are stored. To summarize the blog post(s):
One vtable is created for every class (not instance) with virtual functions. Each instance of this class points to the same vtable in memory
Each vtable is stored in read only memory of the resulting binary file
The disassembly for each function in the vtable is stored in the text section of the resulting ELF binary
Attempting to write over the vtable, located in read only memory, results in a Segmentation fault (as expected)
Each class has a pointer to a list of functions, they are each in the same order for derived classes, then the specific functions that are overrided change at that position in the list.
When you point with a base pointer type, the pointed to object still has the correct _vptr.
Base's
Base::function1()
Base::function2()
D1's
D1::function1()
Base::function2()
D2's
Base::function1()
D2::function2()
Further derived drom D1 or D2 will just add their new virtual functions in the list below the 2 current.
When calling a virtual function we just call the corresponding index, function1 will be index 0
So your call
dPtr->function1();
is actually
dPtr->_vptr[0]();

Smashing C++ VPTRs

I am taking a computer security class and I am reading http://phrack.org/issues/56/8.html.
In bo3.cpp the author creates his own VTABLE, and overwrites VPTR to point to his VTABLE.
To do this he needs the address of VTABLE, which - in this example - is the address of the object.
What is strange to me is that all this executed in the exploited code. I am a beginner, but I think that this technique can not be used in practice, because we can not edit the source code and recompile it. Is there any way to build the VTABLE and overwrite VPTR (for example with a buffer overflow) outside the code (without editing the vulnerable source code)?
Update: Let's say the vulnerable program asks for a string input, and I can overwrite with it the VPTR. I write my own code, creating a VTABLE in it, and printing the VTABLE address. I run my code and pass my VTABLE address (repeated enough times to overwrite the target VPTR) as the string input to the vulnerable program. Will this work? Is there a better/simpler way to do this?
Yes you can use the technique.
Common way how dynamic polymorphism (virtual inheritance) is implemented in C++ is with hidden vtable pointer member. That member is then present in all objects that have virtual functions. It is most typically located at very beginning of object.
If a virtual function is called for object then the program calls a function from that pointed at vtable. So if you manage to overwrite the beginning of object with your data then you can make the vtable pointer to point anywhere else and achieve that something else is executed instead of a virtual member function.
It is impossible to use that exploit if the program has no permission to write to executable memory (or to execute writable memory) but that is not the case with majority of widespread operating systems.

do all instances of the same c++ class share a vtable or would each one get its own?

If Base is a base class and Derived a derived class and there are 25 instances of Derived, how are the vtables set up to be accessed by all the instances? Where are they loaded in the memory?
Compilers are allowed to implement dynamic dispatch however they want in c++, i don't think there is actually any requirement to even use a vtable at all, but it would be very unusual to find a compiler that didn't.
In most cases i think that each class (that contains some virtual methods) will own a single vtable (so if i had 5 instances of class A i will still only have 1 vtable), but this behaviour should not be relied upon in any way.
Non virtual classes have no need for vtables as far as i know.
Reading your question it seems as if you think that each object has its own copy of the code, I'm not sure and i don't want to accuse you of anything like that but just in case ...
Google something like: "what does a c++ object look like in memory"
There will be one vtable somewhere in memory, probably in the same place as the code.
Each instance of the class will contain a single pointer to the vtable for that class, so in your case all 25 instances will contain a pointer to one copy of the vtable.
Multiple and virtual inheritance complicate things, but the principle is the same.

Where is the virtual function table for C++ class stored?

I tried to find where exactly virtual function table gets stored for c++ class.
i found some answers like its a "static array of function pointers"
so will it get stored in data segment read only memory? (initialised one)
Most probably yes. However, it's not mandated. It's not even mandated that polymorphism is implemented via virtual function table, but on most platforms it is. These are implementation details, as long as a compiler obeys the behavior set by the standard, it can do whatever it wants.
A vftable is one per class and stored in only one place in memory.
When you make any function virtual, the compiler will insert a vptr inside your class. As a result, the size of the class will grow by 4 bytes (on Win32).This pointer holds the address of the virtual table (vtable). vtable is constructed by the compiler at compile time and is basically nothing but an array of function pointers. The function pointers are actually pointers to the virtual functions of that particular class. To be more exact, the virtual table is a static array of function pointers, so that different instances of the same class can share that vtable. Since, static members are stored in the data section (.data), the vtable is also stored in the data section of the executable.
It's implementation dependent, yes.
And for g++ (4.9.0), virtual table (not the pointer) is stored in the section .rodata of ELF file and its corresponding segment LOAD in memory.