I think I have a clear understanding of class data members and their in-memory representation:
The members of a class define the layout of objects: data members are stored one after another in memory. When inheritance is used, the data members of the derived class are just added to those of a base.
However, when I am trying to figure out how the "blueprint" of an object is modified by its function members with additional syntax elements: I'm having difficulties. In the following text, I've tried to list all the problematic1 function member syntax that makes it difficult for me to figure out the object memory size and structure.
Class member functions that I couldn't figure out:
function type: lambda, pointer to function, modifying, non-modifying.
containing additional syntax elements: friend(with non-member), virtual, final, override, static, const, volatile, mutable.
Question:
What are the differences between the member functions with different specifiers, in the context of object memory layout and how they affect it?
Note:
I've already read this and this, which does not provide an satisfying answer2. This talks about the general case(which I understand), which is the closest to a duplicate.(BUT I am particular about the list of problematic syntax that is my actual question and is not covered there.)
1. In terms of affecting object memory layout.
2. The first is talking about the GCC compiler and the second provides a link to a book on #m#zon.
Member functions are not part of an object's memory layout. The only thing attributable to member functions is a hidden reference to an implementation-defined structure used to perform dynamic dispatch, such as a virtual method table. This reference is added to your object only if it has at least one virtual member function, so objects of classes that do not have virtual functions are free from this overhead.
Going back to your specific question, the only modifier to a member function that has any effect on the object's memory layout is virtual*. Other modifiers have an effect of how the function itself is interpreted, but they do not change the memory layout of your object.
* override keyword also indicates the presence of a virtual member function in a base class, but it is optional; adding or removing it does not change memory layout of the object.
Related
I was just curious, does the creation of an object in C++ allocate space for a new copy of it's member functions? At the assembly or machine code level, where no classes exist, do all calls for a specific function from different objects of the same class actually refer to the same function pointer or are there multiple function blocks in memory and therefore different pointers for each and every member function of every object derived from the same class?
Usually languages implement functionalities as simply as possible.
Class methods are under the hood just simple functions containing object pointer as an argument, where object in fact is just data structure + functions that can operate on this data structure.
Normally compiler knows which function should operate on the object.
However if there is a case of polymorphism where function may be overriden.
Then compiler doesn't know what is the type of class, it may be Derived1 or Derived2.
Then compiler will add a VTable to this object that will contain function pointers to functions that could have been overridden.
Then for overridable methods the program will make a lookup in this table to see which function should be executed.
You can see how it can be implemented by seeing how polymorphism can be implemented in C:
How can I simulate OO-style polymorphism in C?
No, it does not. Functions are class-wide. When you allocate an object in C++ it will contain space for all its attributes plus a VTable with pointers to all its methods/functions, be it from its own class or inherited from parent classes.
When you call a method on that object, you essentially perform a look-up on that VTable and the appropriate method is called.
When using C-libraries it might be appropopriate to derive a class from a C-structure and add some methods to operate on it without any data-members. F.e. you could add a constructor to initialize the members more conveniently. So this objects might be implicitly upcasted and passed to the C-APIs.
There might be cases where the API expects an array of the C-structures. But is there any guarantee of the C++-language that the derived objectds have the same size as the base-struct so that the distances between the objects are properly offsetted?
BTW: None of the suggestions of similar questions matches my question.
In general, there is no such guarantee. And in particular, if you introduce virtual member functions for example, then there would typically be additional memory used for the virtual table pointer.
If we add an additional assumption that the derived class is standard layout, and no non-standard features such as "packing" is used, then the size would be the same in practice.
However, even if the size is the same, you technically cannot pretend that an array of derived type is an array of base type. In particular, iterating the "pretended" array with a pointer to base would have undefined behaviour. At least that's how it is within C++. Those operations are presumably performed in C across the API. I really don't know what guarantees there are in that case.
I would recommend that if you need to deal with arrays of the C struct (i.e. the pointer would be incremented or subscripted by the API), then instead of wrapping the individual struct, create a C++ wrapper for the entire array.
But is there any guarantee of the C++-language that the derived objectds have the same size as the base-stuct
In general I would expect, that the class will not add any additional to the class memory layout, as long you did not introduce new data members or virtual functions. Use of virtual functions results in adding the v-table pointer.
The implementation is also free to add a v-table pointer if you use virtual inheritance. This will also change the layout for most compilers ( clang and c++ use a vtable in that case! )
But this all is implementation specific and I did not know of a guarantee in the C++ standards which defines that the class layout will guarantee that you can use a derived class without a cast operation as the base class.
You also have to think of padding of data structures which may be different for the derived class.
Generate something ( the derived class ) and use it as something different ( the base struct ) is in general undefined behavior. We are not talking of cast operations! If you cast the derived class to the base class, everything is fine. But packing many instances into a derived class array and simply use it as a base class array is undefined behavior.
Lets say I have a class
struct Foo {
uint32_t x;
uint32_t y;
};
does the C++ standard make any mention whether sizeof(Foo) should be the same as sizeof(Bar) if Bar just adds a constructor ?
struct Bar {
uint32_t x;
uint32_t y;
Bar(uint32_t a = 1,uint32_t b = 2) : x(a),y(b) {}
};
The reason why I am asking is that Foo is send across network as a void* and I cannot change its size, but if possible I would like to add a constructor.
I found some related: here and here, but there the answers focus mainly on virtuals changing the size and I was looking for something more definite than "[...] all implementations I know of [...]".
PS: To avoid misunderstanding... I am not asking how to constsruct a Foo and then send it as a void* nor am I asking for a workaround to make sure the size does not change, but I am really just curious, whether the standard says anything about sizeof in that specific case.
C++ 98 only guarantees layout for “plain old data” objects and those don't permit constructors.
C++ 11 introduces “standard layout types”, which still guarantee layout, but do permit constructors and methods to be added (and permits non-virtual bases to be added with some exceptions for empty classes and duplicates).
Actually, the only thing that influences the layout is the data contained in an object -- with one important exception, coming to that later. If you add any functions (and constructors in reality are nothing more than some kind of static function only with special syntax), you do not influence the layout of the class.
If you have a virtual class (a function with at least one virtual function, including a virtual destructor), your class will contain an entry to a vtable (this is not enforced by the standard, but that's the standard way how polymorphism is implemented), but this is just a pointer to some specific memory location elsewhere. The vtable itself will be modified, if you add more virtual functions, but without any influence on the layout of your data containers (which your class instances actually are).
Now the exception mentioned above: If you add a virtual function to a class (or make an existing one virtual, including the destructor) while the class did not have any virtual functions before (i. e. no own virtual functions and no inherited ones!), then a vtable will be newly added and then the data layout does change! (Analogously, the vtable pointer is removed if you make all functions non-virtual - including all inherited ones).
Guaranteed by the standard?
Edit:
From C++ standard, section 4.5 The C++ object model (§ 1):
[...] Note: A function is not an object, regardless of whether or not it occupies storage in the way that objects do. — end note [...]
Next is deduction (of mine): A function (note: not differentiated if free or member one) is not an object, thus cannot be a subobject, thus is not part of the data of an object.
Further (same §):
An object has a type (6.9). Some objects are polymorphic (13.3); the implementation generates information associated with each such object that makes it possible to determine that object’s type during program execution.
(That is the vtables! - note that it is not explicit about how they are implemented, does not even enforce them at all, if a compiler vendor finds some alternative, it is free to use it...).
For other objects, the interpretation of the values found therein is determined by the type of the expressions (Clause 8) used to access them.
Well, couldn't find any hints (so far), if or how functions influence the layout of a class, not flying over the standard as a whole, not (with special attendance) in chapters 8 (as referenced above) or 12 "Classes" (especially 12.2 "Class members").
Seems as this is not explicitly specified (won't hold my hand into fire for not having overseen, though...). Maybe it is valid to deduce this already from functions not being objects solely...
Standard layout classes, as referenced by Jan Husec, provide further guarantees on layout, such as no reordering of members (which is allowed for members of different accessibility), alignment conditions, ...
From those conditions for being a SLC, I deduce that for these, at least, the guarantee applies, as all that is referenced for being layout compatible is the data members, no mention of (non-static member) functions (other than no virtual ones being allowed...).
As explained by other answers the layout can change in case a virtual destructor is added.
If your destructor is not virtual you can go ahead and add. But say you need a virtual destructor, you can wrap the struct in another struct and place the destructor there.
Make sure the fields that you want to modify are accessible to the wrapper class. A friend relation should be good enough.
Although all your inheritance will be through the wrapper. The inner struct is just to maintain the layout so you can send it over the network.
In The C++ Programming Language by Bjarne Stroustrup, it is said to be possible that a pointer to virtual member function can be passed between different address spaces.
Because a pointer to a virtual member is a kind of offset, it does not depend on
an object's location in memory. A pointer to a virtual member can therefore be
passed between different address spaces as long as the same object layout is used
in both. Like pointers to ordinary functions, pointers to non-virtual functions
cannot be exchanged between address spaces.
However, i don't understand why pointers to non-virtual functions can't. Like pointers to virtual function, it also acts like an index as Bjarne Stroustrup pointed out.
However, a pointer to member isn't a pointer to a piece of memory the way a
pointer to a variable or a pointer to a function is. It is more like an offset
into a structure or an index into an array, but of course an implementation
takes into account the differences between data members, virtual functions,
non-virtual functions, etc.
I, of course, understand the differences between virtual function and non-virtual function such as vtbl and so on. However, on every instantiation of class, it's not like the same member functions are assigned again on memory, which means we cannot calculate memory location of member functions depending on memory address of object. In other words, non-virtual functions do not depend on object's location in memory. I think there is only one interface (=functions) and many objects (=maybe representation). If pointers to non-virtual functions act like just an identifier between member functions, it does not make sense it cannot be passed and used between processes.
Like pointers to virtual function, it also acts like an index as Bjarne Stroustrup pointed out.
Yes, but an index relative to what? A pointer to a non-virtual function is an index into the process' address space. The function can be at a different address in different processes, so has different indices. A pointer (which is just a memory address) to a given function in one process could point to something completely different in another process.
A pointer to a virtual function is an offset relative to the object's address, so given an object (in any process' own address space) you can find the virtual function by applying the offset to get to the vtbl entry. The objects will be at different addresses in different processes (or even different addresses for different instances in one process) but the offset into the vtbl is fixed.
In other words, non-virtual functions do not depend on object's location in memory.
Exactly, that's the problem! They depend on the function's location in memory, which is not constant between processes.
I think there is only one interface (=functions) and many objects (=maybe representation).
Right.
If pointers to non-virtual functions act like just an identifier between member functions, it does not make sense it cannot be passed and used between processes.
But they don't act "like just an identifier" ... they are pointers. They are addresses in memory. If the function is at address 0x12341234 in one process and at address 0x00011234 in another process, you can't pass the pointer between processes, it won't point to the same thing!
In the second quotation, when he talks about pointers-to-members being offsets, this does not include pointers to non-virtual member functions. These are typically implemented as the address of the function code, so they're unlikely to work in another process with its own address space.
A non-virtual function is called like a non-member function, just with an extra hidden this argument. A pointer to one will contain the address of the function to call - any extra levels of indirection are unnecessary and would slow down the function call and bloat the program.
A virtual function is called by looking up the address in a table associated with the object, whose contents depend on the dynamic type. A pointer to one will contain the index into that table. It can't store the address of any particular function: it may refer to a base class member, which is overridden differently by different derived classes, so that the correct override is only known by virtual lookup.
Think like Bjarne. Hint, he's into performance. He wants all the speed he can get.
If you have a non-virtual method you want to call that method immediately. Any indirection will just slow things down. A virtual method has some indirection built in. It needs to look up which particular type of object it's in using this and an offset, which C++ will bypass if it's not needed. In non-virtual functions it's not necessary and is skipped for performance.
In C++ implementations, typically code is not stored (in any form) inside class instances. The code segment is not in the same memory space as objects and the like. This means that member functions are not "stored" inside class instances.
But when a question was asked about this, I got to wondering: to what extent, if at all, does the standard prohibit member functions being stored inside their encapsulating class, to the extent that instantiating the class makes a copy of those functions? Theoretically, could I make an implementation that worked this way? And could it even remotely abide by the common ABIs?
If, in C++, code were a first-class value, then the code for a member function would be simply a const static class member, and you would no more expect to find that in an instance than you would any other static data member. (§ 9.4.2: "A static data member is not part of the subobjects of a class.")
However, code is not considered a value, and furthermore you cannot even construct a pointer to a member function (although you can construct a "pointer to member", that is not really a pointer since it is not usable without a reference to an instance). That makes member function code different both from static data members and non-member functions, both of which allow the creation of free-standing pointers, which furthermore have equality guarantees which (more or less) preclude copying.
Class instances do contain a reference to virtual member functions (indirectly, in most implementations; the pointer is actually to a static vtable) which must be copied when a new instance is created. No requirement is made on the size of the reference, so in theory (as far as I know) there is nothing to stop an implementation from avoiding the indirections and storing the entire code anew for each instance of the class.
But there is an exception for standard-layout types, which is a subset of classes with no virtual member functions, expressed in § 9.12/18, which requires that two standard-layout types with identical initial members have identical layout for the initial members. Recalling that standard-layout objects must be simply copyable with memcpy (§3.9/3), must be contiguous in memory (§1.8/5), and must include their members in order (§9.12/13), this requirement makes it effectively impossible to include class-specific static data in any standard-layout object, which would include the code for member functions.
So I conclude that at least for standard-layout objects, the C++ standard does prohibit the storage of static data, including code for member functions, within the object representation.