Exchangability of pointers to member function between different address spaces - c++

In The C++ Programming Language by Bjarne Stroustrup, it is said to be possible that a pointer to virtual member function can be passed between different address spaces.
Because a pointer to a virtual member is a kind of offset, it does not depend on
an object's location in memory. A pointer to a virtual member can therefore be
passed between different address spaces as long as the same object layout is used
in both. Like pointers to ordinary functions, pointers to non-virtual functions
cannot be exchanged between address spaces.
However, i don't understand why pointers to non-virtual functions can't. Like pointers to virtual function, it also acts like an index as Bjarne Stroustrup pointed out.
However, a pointer to member isn't a pointer to a piece of memory the way a
pointer to a variable or a pointer to a function is. It is more like an offset
into a structure or an index into an array, but of course an implementation
takes into account the differences between data members, virtual functions,
non-virtual functions, etc.
I, of course, understand the differences between virtual function and non-virtual function such as vtbl and so on. However, on every instantiation of class, it's not like the same member functions are assigned again on memory, which means we cannot calculate memory location of member functions depending on memory address of object. In other words, non-virtual functions do not depend on object's location in memory. I think there is only one interface (=functions) and many objects (=maybe representation). If pointers to non-virtual functions act like just an identifier between member functions, it does not make sense it cannot be passed and used between processes.

Like pointers to virtual function, it also acts like an index as Bjarne Stroustrup pointed out.
Yes, but an index relative to what? A pointer to a non-virtual function is an index into the process' address space. The function can be at a different address in different processes, so has different indices. A pointer (which is just a memory address) to a given function in one process could point to something completely different in another process.
A pointer to a virtual function is an offset relative to the object's address, so given an object (in any process' own address space) you can find the virtual function by applying the offset to get to the vtbl entry. The objects will be at different addresses in different processes (or even different addresses for different instances in one process) but the offset into the vtbl is fixed.
In other words, non-virtual functions do not depend on object's location in memory.
Exactly, that's the problem! They depend on the function's location in memory, which is not constant between processes.
I think there is only one interface (=functions) and many objects (=maybe representation).
Right.
If pointers to non-virtual functions act like just an identifier between member functions, it does not make sense it cannot be passed and used between processes.
But they don't act "like just an identifier" ... they are pointers. They are addresses in memory. If the function is at address 0x12341234 in one process and at address 0x00011234 in another process, you can't pass the pointer between processes, it won't point to the same thing!

In the second quotation, when he talks about pointers-to-members being offsets, this does not include pointers to non-virtual member functions. These are typically implemented as the address of the function code, so they're unlikely to work in another process with its own address space.

A non-virtual function is called like a non-member function, just with an extra hidden this argument. A pointer to one will contain the address of the function to call - any extra levels of indirection are unnecessary and would slow down the function call and bloat the program.
A virtual function is called by looking up the address in a table associated with the object, whose contents depend on the dynamic type. A pointer to one will contain the index into that table. It can't store the address of any particular function: it may refer to a base class member, which is overridden differently by different derived classes, so that the correct override is only known by virtual lookup.

Think like Bjarne. Hint, he's into performance. He wants all the speed he can get.
If you have a non-virtual method you want to call that method immediately. Any indirection will just slow things down. A virtual method has some indirection built in. It needs to look up which particular type of object it's in using this and an offset, which C++ will bypass if it's not needed. In non-virtual functions it's not necessary and is skipped for performance.

Related

Does each object in c++ contain a different version of the class's member functions?

I was just curious, does the creation of an object in C++ allocate space for a new copy of it's member functions? At the assembly or machine code level, where no classes exist, do all calls for a specific function from different objects of the same class actually refer to the same function pointer or are there multiple function blocks in memory and therefore different pointers for each and every member function of every object derived from the same class?
Usually languages implement functionalities as simply as possible.
Class methods are under the hood just simple functions containing object pointer as an argument, where object in fact is just data structure + functions that can operate on this data structure.
Normally compiler knows which function should operate on the object.
However if there is a case of polymorphism where function may be overriden.
Then compiler doesn't know what is the type of class, it may be Derived1 or Derived2.
Then compiler will add a VTable to this object that will contain function pointers to functions that could have been overridden.
Then for overridable methods the program will make a lookup in this table to see which function should be executed.
You can see how it can be implemented by seeing how polymorphism can be implemented in C:
How can I simulate OO-style polymorphism in C?
No, it does not. Functions are class-wide. When you allocate an object in C++ it will contain space for all its attributes plus a VTable with pointers to all its methods/functions, be it from its own class or inherited from parent classes.
When you call a method on that object, you essentially perform a look-up on that VTable and the appropriate method is called.

In C++, why do pointers to members of a class contains offsets rather than addresses?

Normally pointers contain addresses. Why do pointers to member of a class contain offsets?
Talking about pointers to class data members, not pointers to member functions.
A pointer to a class member differs from a regular pointer in that it by itself doesn't actually point at a single location in memory. For example, if you have this setup:
struct MyStruct {
int x;
int y;
};
int MyStruct::* ptr = &MyStruct::y;
Then the pointer ptr isn't actually pointing to a memory address because there is no one object MyStruct::y. Rather, each instance of MyStruct has its own copy of the data member y.
The C++ standard doesn't mandate how pointers-to-member are actually implemented, but a common strategy is to have the pointer-to-member store an offset in bytes from the base of the object to the field in question. That way, when you write something like
MyStruct ms;
ms.*ptr = 137;
The compiler can generate code that says "go to the base address of ms, skip forward a number of bytes specified by the value stored in ptr, then write 137."
Pointers do not have to contain addresses. Pointers are tools to access a certain object indirectly - i.e. write code which would access an object which is unknown at compile time. Example:
a = 10; // It is known at compile time, 10 is written to object a
*pi = 10; // It is not known at compile time where 10 is written to
For normal objects, having address in the pointer is a cheapest way to achieve this goal (however, it won't be the case for C++ script!).
For members, though, normal address wouldn't work - there is no such thing as an address of a member of the class! You can only know the physical address of the member when you have a class object.
So, you have to use some sort of offset, which you than can translate to physical address once the actual object is known.
Pointers-to-members are not pointers.
The C++ type system contains several kinds of compound types, and there are two separate, sibling kinds of compound types that are not related to one another, despite having similar sounding names: Pointers, and "pointers to non-static class members". I'll call the latter "pointer-to-members", with hyphen, to stress that those are not a special kind of pointer, but really something entirely separate.
By the way, the type trait library separates the two concepts via std::is_pointer and std::is_member_pointer.
The two kinds of types serve entirely distinct purposes:
A pointer can represent the address of an object (or function) (or be null). That is, given a dereferenceable pointer, there is an actual concrete thing there that it's pointing at.
A pointer-to-member represents an abstract reference from a class to a non-static class member. Note that there are no objects involved. Such a pointer value does not point at anything concrete, and it has no concept of "dereferencing".
To repeat the final sentence: A pointer-to-member cannot be dereferenced, and it is not in any sense the address of an object. Rather, a pointer-to-member (specifically, to data member) can be applied to an object instance of its class, and together with the object it selects the subobject of that object corresponding to the class member that it represents. (Pointers-to-member-function have a slightly different notion of being applied to an instance; the result of the application is a function call.)
Returning to your question at last: A pointer holds an address of an object or function because it points at an object or function. A pointer-to-member does not hold an address, because it doesn't point at anything; it represents a relationship.
In C++ class members are closely related to their respective class. Thus, a declaration of a pointer to member always contains the class it belongs to. If this results in an offset into the class instance stored in the pointer or something else is not specified.
That way the C++ standard supports nearly any memory addressing scheme like, for example, the non-linear page::offset addresses of early x86, where you had to distinguish between near and far pointers for better performance.
A different case are static class members. They are like global functions/variables in the namespace of the class, eventually with protected or private access restrictions. You may retrieve pointers to static elements and use them like regular pointers.

How member functions' additional syntax/specifiers affect memory layout in classes?

I think I have a clear understanding of class data members and their in-memory representation:
The members of a class define the layout of objects: data members are stored one after another in memory. When inheritance is used, the data members of the derived class are just added to those of a base.
However, when I am trying to figure out how the "blueprint" of an object is modified by its function members with additional syntax elements: I'm having difficulties. In the following text, I've tried to list all the problematic1 function member syntax that makes it difficult for me to figure out the object memory size and structure.
Class member functions that I couldn't figure out:
function type: lambda, pointer to function, modifying, non-modifying.
containing additional syntax elements: friend(with non-member), virtual, final, override, static, const, volatile, mutable.
Question:
What are the differences between the member functions with different specifiers, in the context of object memory layout and how they affect it?
Note:
I've already read this and this, which does not provide an satisfying answer2. This talks about the general case(which I understand), which is the closest to a duplicate.(BUT I am particular about the list of problematic syntax that is my actual question and is not covered there.)
1. In terms of affecting object memory layout.
2. The first is talking about the GCC compiler and the second provides a link to a book on #m#zon.
Member functions are not part of an object's memory layout. The only thing attributable to member functions is a hidden reference to an implementation-defined structure used to perform dynamic dispatch, such as a virtual method table. This reference is added to your object only if it has at least one virtual member function, so objects of classes that do not have virtual functions are free from this overhead.
Going back to your specific question, the only modifier to a member function that has any effect on the object's memory layout is virtual*. Other modifiers have an effect of how the function itself is interpreted, but they do not change the memory layout of your object.
* override keyword also indicates the presence of a virtual member function in a base class, but it is optional; adding or removing it does not change memory layout of the object.

Are methods duplicated in memory for every instance of an object? If so, can this be avoided?

Say I have an object that exists in high quantity, stores little data about itself, but requires several larger functions to act upon itself.
class Foo
{
public:
bool is_dead();
private:
float x, y, z;
bool dead;
void check_self();
void update_self();
void question_self();
};
What behavior can I expect from the compiler - would every new Foo object cause duplicates of its methods to be copied into memory?
If yes, what are good options for managing class-specific (private-like) functions while avoiding duplication?
If not, could you elaborate on this a little?
C++ methods are simply functions (with a convention about this which often becomes the implicit first argument).
Functions are mostly machine code, starting at some specific address. The start address is all that is needed to call the function.
So objects (or their vtable) need at most the address of called functions.
Of course a function takes some place (in the text segment).
But an object won't need extra space for that function. If the function is not virtual, no extra space per object is needed. If the function is virtual, the object has a single vtable (per virtual class). Generally, each object has, as its first field, the pointer to the vtable. This means 8 bytes per object on x86-64/Linux. Each object (assuming single inheritance) has one vtable pointer, independently of the number or of the code size of the virtual
functions.
If you have multiple, perhaps virtual, inheritance with virtual methods in several superclasses you'll need several vtable pointers per instance.
So for your Foo example, there is no virtual function (and no superclass containing some of them), so instances of Foo contain no vtable pointer.
If you add one (or many hundreds) of virtual functions to Foo (then you should have a virtual destructor, see rule of three in C++), each instance would have one vtable pointer.
If you want a behavior to be specific to instances (so instances a and b could have different behavior) without using the class machinery for that, you need some member function pointers (in C++03) or (in C++11) some std::function (perhaps anonymous closures). Of course they need space in every instance.
BTW, to know the size of some type or class, use sizeof .... (it does include the vtable[s] pointer[s] if relevant).
Methods exists for every class in program, not for every object.
Try to read some good books about c++ to know so easy facts about language.

c++: Does a vtable contains pointers to non-virtual functions?

vtable contains pointers to virtual functions of that class. Does it also contains pointers to non-virtual functions as well?
Thx!
It's an implementation detail, but no. If an implementation put pointers to non-virtual functions into a vtable it couldn't use these pointers for making function calls because it would often cause incorrect non-virtual functions to be called.
When a non-virtual function is called the implementation must use the static type of the object on which the function is being called to determine the correct function to call. A function stored in a vtable accessed by a vptr will be dependent on the dynamic type of the object, not any static type of a reference or pointer through which it is being accessed.
No, it doesn't.
As calls to non-virtual methods can be resolved during compilation (since compiler knows the addresses of non virtual functions), the compiler generates instructions to call them 'directly' (i.e. statically).
There is no reason to go through vtable indirection mechanism for methods which are known during compiling.
Whether or not a "vtable" is used by any implementation isn't defined by the standard. Most implementations use a table of function pointers although the functions pointed to are typically not directly those being called (instead, the pointed to function may adjust the pointer before calling the actual function).
Whether or not non-virtual functions show up in this table is also not defined by standard. After all, the standard doesn't even require the existence of a vtable. Normally, non-virtual function are not in a virtual function table since any necessary pointer adjustments and call can be resolved at compile- or link-time. I could imagine an implementation treating all functions similarly and, thus, using a pointer in the virtual function table in all cases. I wouldn't necessary be very popular. However, it might be a good way to implement C++ in an environment where it seamlessly interacts with a more flexible object system, e.g., languages where individual functions can be replaced at run-time (my understanding is that something like this is possible, e.g., in python).
No. A vtable only contains pointers to virtual functions in the same class or file.