Memory structure of a function-only object? - c++

Let's say we have a class that looks like this:
class A
{
public:
int FuncA( int x );
int FuncB( int y );
int a;
int b;
};
Now, I know that objects of this class will be laid out in memory with just the two ints. That is, if I make a vector of instances of class A, there will be two ints for one instance, then followed by two ints for the second instance etc. The objects are POD.
BUT let's say the class looks like this:
class B
{
public:
int FuncA( int x );
int FuncB( int y );
};
What do objects of this class look like in memory? If I fill a vector with instances of B... what's in the vector? I've been told that non-virtual member functions are in the end compiled as free functions somewhere completely unrelated to the instances of the class in which they're declared (virtual function are too, but the objects store a vtable with function pointers). That the access restrictions are merely at the semantic, "human" level. Only the data members of a class (and the vtable etc.) actually make up the memory structure of objects.
So again, what do objects of class B look like in memory? Is it some kind of placeholder value? Something has to be there, I can take the object's address. It has to point to something. Whatever it is, is the compiler allowed to inline/optimize out these objects and treat the method calls as just normal free function calls? If I create a vector of these and call the same method on every object, can the compiler eliminate the vector and replace it with just a bunch of normal calls?
I'm just curious.

All objects in C++ are guaranteed to have a sizeof >= 1 so that each object will have a unique address.
I haven't tried it, but I would guess that in your example, the compiler would allocate but not initialize 1 byte for each function object in the array/vector.

As Ferruccio said, All objects in C++ are guaranteed to have a size of at least 1. Mostly likely, it's 1 byte, but fills out the size of the alignment, but whatever.
However, when used as a base class, it does not need to fill any space, so that:
class A {} a; // a is 1 byte.
class B {} b; // b is 1 byte.
class C { A a; B b;} c; // c is 2 bytes.
class D : public A, B { } d; // d is 1 byte.
class E : public A, B { char ee; } e; // e is only 1 byte

What do objects of this class look like in memory?
It's entirely up to the compiler. An instance of an empty class must have non-zero size, so that distinct objects have distinct addresses (unless it's instantiated as a base class of another class, in which case it can take up no space at all). Typically, it will consist of a single uninitialised byte.
Whatever it is, is the compiler allowed to inline/optimize out these objects and treat the method calls as just normal free function calls?
Yes; the compiler doesn't have to create the object unless you do something like taking its address. Empty function objects are used quite a lot in the Standard Library, so it's important that they don't introduce any unnecessary overhead.

I performed the following experiment:
#include <iostream>
class B
{
public:
int FuncA( int x );
int FuncB( int y );
};
int main()
{
std::cout << sizeof( B ) ;
}
The result was 1 (VC++ 2010)
It seems to me that the class actually requires no memory whatsoever, but that an object cannot be zero sized since that would make no semantic sense if you took its address for example. This is borne out by Ferruccio's answer.s

Everything I say from here on out is implementation dependent - but most implementations will conform.
If the class has any virtual methods, there will be an invisible vtable pointer member. That isn't the case with your example however.
Yes, the compiler will treat a member function call the same as a free function call, again unless it's a virtual function. Even if it is a virtual function, the compiler can bypass the vtable if it knows the concrete type at the time of the call. Each call will still depend on the object, because there's an invisible this parameter with the object's pointer that gets added to the call.

I would think they just look like any objects in C++:
Each instance of the class occupies space. Because objects in C++ must have a size of at least 1 (so they have unique addresses, as Ferruccino said), objects that don't specify any data don't receive special treatment.
Non-virtual functions do not occupy any space at all in a class. Rather, they can be thought of as functions like this:
int B_FuncA(B *this, int x);
int B_FuncB(B *this, int y);
If this class can be used by other .cpp files, I think these will become actual class instances, not regular functions.
If you just want your functions to be free rather than bound to objects, you could either make them static or use a namespace.

I've been told that non-virtual member functions are in the end compiled as free functions somewhere completely unrelated to the instances of the class in which they're declared (virtual function are too, but the objects store a vtable with function pointers). That the access restrictions are merely at the semantic, "human" level. Only the data members of a class (and the vtable etc.) actually make up the memory structure of objects.
Yep, that is usually how it works. It might be worth pointing out the distinction that this isn't specified in the standard, and it's not required -- it just makes sense to implement classes like this in the compiler.
So again, what do objects of class B look like in memory? Is it some kind of placeholder value? Something has to be there, I can take the object's address
Exactly. :)
The C++ standard requires that objects take up at least one byte, for exactly the reason you say. It must have an address, and if I put these objects into an array, I must be able to increment a pointer in order to get "the next" object, so every object must have a unique address and take up at least 1 byte. (Of course, empty objects don't have to take exactly 1 byte. Some compilers may choose to make them 4 bytes, or any other size, for performance reasons)
A sensible compiler won't even make it a placeholder value though. Why bother writing any specific value into this one byte? We can just let it contain whatever garbage it held when the object was created. It'll never be accessed anyway. A single byte is just allocated, and never read or written to.

Related

A function to return different derived-type object/reference not a pointer [duplicate]

I did find some questions already on StackOverflow with similar title, but when I read the answers, they were focusing on different parts of the question, which were really specific (e.g. STL/containers).
Could someone please show me, why you must use pointers/references for implementing polymorphism? I can understand pointers may help, but surely references only differentiate between pass-by-value and pass-by-reference?
Surely so long as you allocate memory on the heap, so that you can have dynamic binding, then this would have been enough. Obviously not.
"Surely so long as you allocate memory on the heap" - where the memory is allocated has nothing to do with it. It's all about the semantics. Take, for instance:
Derived d;
Base* b = &d;
d is on the stack (automatic memory), but polymorphism will still work on b.
If you don't have a base class pointer or reference to a derived class, polymorphism doesn't work because you no longer have a derived class. Take
Base c = Derived();
The c object isn't a Derived, but a Base, because of slicing. So, technically, polymorphism still works, it's just that you no longer have a Derived object to talk about.
Now take
Base* c = new Derived();
c just points to some place in memory, and you don't really care whether that's actually a Base or a Derived, but the call to a virtual method will be resolved dynamically.
In C++, an object always has a fixed type and size known at compile-time and (if it can and does have its address taken) always exists at a fixed address for the duration of its lifetime. These are features inherited from C which help make both languages suitable for low-level systems programming. (All of this is subject to the as-if, rule, though: a conforming compiler is free to do whatever it pleases with code as long as it can be proven to have no detectable effect on any behavior of a conforming program that is guaranteed by the standard.)
A virtual function in C++ is defined (more or less, no need for extreme language lawyering) as executing based on the run-time type of an object; when called directly on an object this will always be the compile-time type of the object, so there is no polymorphism when a virtual function is called this way.
Note that this didn't necessarily have to be the case: object types with virtual functions are usually implemented in C++ with a per-object pointer to a table of virtual functions which is unique to each type. If so inclined, a compiler for some hypothetical variant of C++ could implement assignment on objects (such as Base b; b = Derived()) as copying both the contents of the object and the virtual table pointer along with it, which would easily work if both Base and Derived were the same size. In the case that the two were not the same size, the compiler could even insert code that pauses the program for an arbitrary amount of time in order to rearrange memory in the program and update all possible references to that memory in a way that could be proven to have no detectable effect on the semantics of the program, terminating the program if no such rearrangement could be found: this would be very inefficient, though, and could not be guaranteed to ever halt, obviously not desirable features for an assignment operator to have.
So in lieu of the above, polymorphism in C++ is accomplished by allowing references and pointers to objects to reference and point to objects of their declared compile-time types and any subtypes thereof. When a virtual function is called through a reference or pointer, and the compiler cannot prove that the object referenced or pointed to is of a run-time type with a specific known implementation of that virtual function, the compiler inserts code which looks up the correct virtual function to call a run-time. It did not have to be this way, either: references and pointers could have been defined as being non-polymorphic (disallowing them to reference or point to subtypes of their declared types) and forcing the programmer to come up with alternative ways of implementing polymorphism. The latter is clearly possible since it's done all the time in C, but at that point there's not much reason to have a new language at all.
In sum, the semantics of C++ are designed in such a way to allow the high-level abstraction and encapsulation of object-oriented polymorphism while still retaining features (like low-level access and explicit management of memory) which allow it to be suitable for low-level development. You could easily design a language that had some other semantics, but it would not be C++ and would have different benefits and drawbacks.
I found it helpful to understand that a copy constructor is invoked when assigning like this:
class Base { };
class Derived : public Base { };
Derived x; /* Derived type object created */
Base y = x; /* Copy is made (using Base's copy constructor), so y really is of type Base. Copy can cause "slicing" btw. */
Since y is an actual object of class Base, rather than the original one, functions called on this are Base's functions.
Consider little endian architectures: values are stored low-order-bytes first. So, for any given unsigned integer, the values 0-255 are stored in the first byte of the value. Accessing the low 8-bits of any value simply requires a pointer to it's address.
So we could implement uint8 as a class. We know that an instance of uint8 is ... one byte. If we derive from it and produce uint16, uint32, etc, the interface remains the same for purposes of abstraction, but the one most important change is size of the concrete instances of the object.
Of course, if we implemented uint8 and char, the sizes may be the same, likewise sint8.
However, operator= of uint8 and uint16 are going to move different quantities of data.
In order to create a Polymorphic function we must either be able to:
a/ receive the argument by value by copying the data into a new location of the correct size and layout,
b/ take a pointer to the object's location,
c/ take a reference to the object instance,
We can use templates to achieve a, so polymorphism can work without pointers and references, but if we are not counting templates, then lets consider what happens if we implement uint128 and pass it to a function expecting uint8? Answer: 8 bits get copied instead of 128.
So what if we made our polymorphic function accept uint128 and we passed it a uint8. If our uint8 we were copying was unfortunately located, our function would attempt to copy 128 bytes of which 127 were outside of our accessible memory -> crash.
Consider the following:
class A { int x; };
A fn(A a)
{
return a;
}
class B : public A {
uint64_t a, b, c;
B(int x_, uint64_t a_, uint64_t b_, uint64_t c_)
: A(x_), a(a_), b(b_), c(c_) {}
};
B b1 { 10, 1, 2, 3 };
B b2 = fn(b1);
// b2.x == 10, but a, b and c?
At the time fn was compiled, there was no knowledge of B. However, B is derived from A so polymorphism should allow that we can call fn with a B. However, the object it returns should be an A comprising a single int.
If we pass an instance of B to this function, what we get back should be just a { int x; } with no a, b, c.
This is "slicing".
Even with pointers and references we don't avoid this for free. Consider:
std::vector<A*> vec;
Elements of this vector could be pointers to A or something derived from A. The language generally solves this through the use of the "vtable", a small addition to the object's instance which identifies the type and provides function pointers for virtual functions. You can think of it as something like:
template<class T>
struct PolymorphicObject {
T::vtable* __vtptr;
T __instance;
};
Rather than every object having its own distinct vtable, classes have them, and object instances merely point to the relevant vtable.
The problem now is not slicing but type correctness:
struct A { virtual const char* fn() { return "A"; } };
struct B : public A { virtual const char* fn() { return "B"; } };
#include <iostream>
#include <cstring>
int main()
{
A* a = new A();
B* b = new B();
memcpy(a, b, sizeof(A));
std::cout << "sizeof A = " << sizeof(A)
<< " a->fn(): " << a->fn() << '\n';
}
http://ideone.com/G62Cn0
sizeof A = 4 a->fn(): B
What we should have done is use a->operator=(b)
http://ideone.com/Vym3Lp
but again, this is copying an A to an A and so slicing would occur:
struct A { int i; A(int i_) : i(i_) {} virtual const char* fn() { return "A"; } };
struct B : public A {
int j;
B(int i_) : A(i_), j(i_ + 10) {}
virtual const char* fn() { return "B"; }
};
#include <iostream>
#include <cstring>
int main()
{
A* a = new A(1);
B* b = new B(2);
*a = *b; // aka a->operator=(static_cast<A*>(*b));
std::cout << "sizeof A = " << sizeof(A)
<< ", a->i = " << a->i << ", a->fn(): " << a->fn() << '\n';
}
http://ideone.com/DHGwun
(i is copied, but B's j is lost)
The conclusion here is that pointers/references are required because the original instance carries membership information with it that copying may interact with.
But also, that polymorphism is not perfectly solved within C++ and one must be cognizant of their obligation to provide/block actions which could produce slicing.
You need pointers or reference because for the kind of polymorphism you are interested in (*), you need that the dynamic type could be different from the static type, in other words that the true type of the object is different than the declared type. In C++ that happens only with pointers or references.
(*) Genericity, the type of polymorphism provided by templates, doesn't need pointers nor references.
When an object is passed by value, it's typically put on the stack. Putting something on the stack requires knowledge of just how big it is. When using polymorphism, you know that the incoming object implements a particular set of features, but you usually have no idea the size of the object (nor should you, necessarily, that's part of the benefit). Thus, you can't put it on the stack. You do, however, always know the size of a pointer.
Now, not everything goes on the stack, and there are other extenuating circumstances. In the case of virtual methods, the pointer to the object is also a pointer to the object's vtable(s), which indicate where the methods are. This allows the compiler to find and call the functions, regardless of what object it's working with.
Another cause is that very often the object is implemented outside of the calling library, and allocated with a completely different (and possibly incompatible) memory manager. It could also have members that can't be copied, or would cause problems if they were copied with a different manager. There could be side-effects to copying and all sorts of other complications.
The result is that the pointer is the only bit of information on the object that you really properly understand, and provides enough information to figure out where the other bits you need are.

Check if list of abstract elements contains an element of a certain derived type in C++? [duplicate]

I did find some questions already on StackOverflow with similar title, but when I read the answers, they were focusing on different parts of the question, which were really specific (e.g. STL/containers).
Could someone please show me, why you must use pointers/references for implementing polymorphism? I can understand pointers may help, but surely references only differentiate between pass-by-value and pass-by-reference?
Surely so long as you allocate memory on the heap, so that you can have dynamic binding, then this would have been enough. Obviously not.
"Surely so long as you allocate memory on the heap" - where the memory is allocated has nothing to do with it. It's all about the semantics. Take, for instance:
Derived d;
Base* b = &d;
d is on the stack (automatic memory), but polymorphism will still work on b.
If you don't have a base class pointer or reference to a derived class, polymorphism doesn't work because you no longer have a derived class. Take
Base c = Derived();
The c object isn't a Derived, but a Base, because of slicing. So, technically, polymorphism still works, it's just that you no longer have a Derived object to talk about.
Now take
Base* c = new Derived();
c just points to some place in memory, and you don't really care whether that's actually a Base or a Derived, but the call to a virtual method will be resolved dynamically.
In C++, an object always has a fixed type and size known at compile-time and (if it can and does have its address taken) always exists at a fixed address for the duration of its lifetime. These are features inherited from C which help make both languages suitable for low-level systems programming. (All of this is subject to the as-if, rule, though: a conforming compiler is free to do whatever it pleases with code as long as it can be proven to have no detectable effect on any behavior of a conforming program that is guaranteed by the standard.)
A virtual function in C++ is defined (more or less, no need for extreme language lawyering) as executing based on the run-time type of an object; when called directly on an object this will always be the compile-time type of the object, so there is no polymorphism when a virtual function is called this way.
Note that this didn't necessarily have to be the case: object types with virtual functions are usually implemented in C++ with a per-object pointer to a table of virtual functions which is unique to each type. If so inclined, a compiler for some hypothetical variant of C++ could implement assignment on objects (such as Base b; b = Derived()) as copying both the contents of the object and the virtual table pointer along with it, which would easily work if both Base and Derived were the same size. In the case that the two were not the same size, the compiler could even insert code that pauses the program for an arbitrary amount of time in order to rearrange memory in the program and update all possible references to that memory in a way that could be proven to have no detectable effect on the semantics of the program, terminating the program if no such rearrangement could be found: this would be very inefficient, though, and could not be guaranteed to ever halt, obviously not desirable features for an assignment operator to have.
So in lieu of the above, polymorphism in C++ is accomplished by allowing references and pointers to objects to reference and point to objects of their declared compile-time types and any subtypes thereof. When a virtual function is called through a reference or pointer, and the compiler cannot prove that the object referenced or pointed to is of a run-time type with a specific known implementation of that virtual function, the compiler inserts code which looks up the correct virtual function to call a run-time. It did not have to be this way, either: references and pointers could have been defined as being non-polymorphic (disallowing them to reference or point to subtypes of their declared types) and forcing the programmer to come up with alternative ways of implementing polymorphism. The latter is clearly possible since it's done all the time in C, but at that point there's not much reason to have a new language at all.
In sum, the semantics of C++ are designed in such a way to allow the high-level abstraction and encapsulation of object-oriented polymorphism while still retaining features (like low-level access and explicit management of memory) which allow it to be suitable for low-level development. You could easily design a language that had some other semantics, but it would not be C++ and would have different benefits and drawbacks.
I found it helpful to understand that a copy constructor is invoked when assigning like this:
class Base { };
class Derived : public Base { };
Derived x; /* Derived type object created */
Base y = x; /* Copy is made (using Base's copy constructor), so y really is of type Base. Copy can cause "slicing" btw. */
Since y is an actual object of class Base, rather than the original one, functions called on this are Base's functions.
Consider little endian architectures: values are stored low-order-bytes first. So, for any given unsigned integer, the values 0-255 are stored in the first byte of the value. Accessing the low 8-bits of any value simply requires a pointer to it's address.
So we could implement uint8 as a class. We know that an instance of uint8 is ... one byte. If we derive from it and produce uint16, uint32, etc, the interface remains the same for purposes of abstraction, but the one most important change is size of the concrete instances of the object.
Of course, if we implemented uint8 and char, the sizes may be the same, likewise sint8.
However, operator= of uint8 and uint16 are going to move different quantities of data.
In order to create a Polymorphic function we must either be able to:
a/ receive the argument by value by copying the data into a new location of the correct size and layout,
b/ take a pointer to the object's location,
c/ take a reference to the object instance,
We can use templates to achieve a, so polymorphism can work without pointers and references, but if we are not counting templates, then lets consider what happens if we implement uint128 and pass it to a function expecting uint8? Answer: 8 bits get copied instead of 128.
So what if we made our polymorphic function accept uint128 and we passed it a uint8. If our uint8 we were copying was unfortunately located, our function would attempt to copy 128 bytes of which 127 were outside of our accessible memory -> crash.
Consider the following:
class A { int x; };
A fn(A a)
{
return a;
}
class B : public A {
uint64_t a, b, c;
B(int x_, uint64_t a_, uint64_t b_, uint64_t c_)
: A(x_), a(a_), b(b_), c(c_) {}
};
B b1 { 10, 1, 2, 3 };
B b2 = fn(b1);
// b2.x == 10, but a, b and c?
At the time fn was compiled, there was no knowledge of B. However, B is derived from A so polymorphism should allow that we can call fn with a B. However, the object it returns should be an A comprising a single int.
If we pass an instance of B to this function, what we get back should be just a { int x; } with no a, b, c.
This is "slicing".
Even with pointers and references we don't avoid this for free. Consider:
std::vector<A*> vec;
Elements of this vector could be pointers to A or something derived from A. The language generally solves this through the use of the "vtable", a small addition to the object's instance which identifies the type and provides function pointers for virtual functions. You can think of it as something like:
template<class T>
struct PolymorphicObject {
T::vtable* __vtptr;
T __instance;
};
Rather than every object having its own distinct vtable, classes have them, and object instances merely point to the relevant vtable.
The problem now is not slicing but type correctness:
struct A { virtual const char* fn() { return "A"; } };
struct B : public A { virtual const char* fn() { return "B"; } };
#include <iostream>
#include <cstring>
int main()
{
A* a = new A();
B* b = new B();
memcpy(a, b, sizeof(A));
std::cout << "sizeof A = " << sizeof(A)
<< " a->fn(): " << a->fn() << '\n';
}
http://ideone.com/G62Cn0
sizeof A = 4 a->fn(): B
What we should have done is use a->operator=(b)
http://ideone.com/Vym3Lp
but again, this is copying an A to an A and so slicing would occur:
struct A { int i; A(int i_) : i(i_) {} virtual const char* fn() { return "A"; } };
struct B : public A {
int j;
B(int i_) : A(i_), j(i_ + 10) {}
virtual const char* fn() { return "B"; }
};
#include <iostream>
#include <cstring>
int main()
{
A* a = new A(1);
B* b = new B(2);
*a = *b; // aka a->operator=(static_cast<A*>(*b));
std::cout << "sizeof A = " << sizeof(A)
<< ", a->i = " << a->i << ", a->fn(): " << a->fn() << '\n';
}
http://ideone.com/DHGwun
(i is copied, but B's j is lost)
The conclusion here is that pointers/references are required because the original instance carries membership information with it that copying may interact with.
But also, that polymorphism is not perfectly solved within C++ and one must be cognizant of their obligation to provide/block actions which could produce slicing.
You need pointers or reference because for the kind of polymorphism you are interested in (*), you need that the dynamic type could be different from the static type, in other words that the true type of the object is different than the declared type. In C++ that happens only with pointers or references.
(*) Genericity, the type of polymorphism provided by templates, doesn't need pointers nor references.
When an object is passed by value, it's typically put on the stack. Putting something on the stack requires knowledge of just how big it is. When using polymorphism, you know that the incoming object implements a particular set of features, but you usually have no idea the size of the object (nor should you, necessarily, that's part of the benefit). Thus, you can't put it on the stack. You do, however, always know the size of a pointer.
Now, not everything goes on the stack, and there are other extenuating circumstances. In the case of virtual methods, the pointer to the object is also a pointer to the object's vtable(s), which indicate where the methods are. This allows the compiler to find and call the functions, regardless of what object it's working with.
Another cause is that very often the object is implemented outside of the calling library, and allocated with a completely different (and possibly incompatible) memory manager. It could also have members that can't be copied, or would cause problems if they were copied with a different manager. There could be side-effects to copying and all sorts of other complications.
The result is that the pointer is the only bit of information on the object that you really properly understand, and provides enough information to figure out where the other bits you need are.

Call derived class method when upcasted [duplicate]

I did find some questions already on StackOverflow with similar title, but when I read the answers, they were focusing on different parts of the question, which were really specific (e.g. STL/containers).
Could someone please show me, why you must use pointers/references for implementing polymorphism? I can understand pointers may help, but surely references only differentiate between pass-by-value and pass-by-reference?
Surely so long as you allocate memory on the heap, so that you can have dynamic binding, then this would have been enough. Obviously not.
"Surely so long as you allocate memory on the heap" - where the memory is allocated has nothing to do with it. It's all about the semantics. Take, for instance:
Derived d;
Base* b = &d;
d is on the stack (automatic memory), but polymorphism will still work on b.
If you don't have a base class pointer or reference to a derived class, polymorphism doesn't work because you no longer have a derived class. Take
Base c = Derived();
The c object isn't a Derived, but a Base, because of slicing. So, technically, polymorphism still works, it's just that you no longer have a Derived object to talk about.
Now take
Base* c = new Derived();
c just points to some place in memory, and you don't really care whether that's actually a Base or a Derived, but the call to a virtual method will be resolved dynamically.
In C++, an object always has a fixed type and size known at compile-time and (if it can and does have its address taken) always exists at a fixed address for the duration of its lifetime. These are features inherited from C which help make both languages suitable for low-level systems programming. (All of this is subject to the as-if, rule, though: a conforming compiler is free to do whatever it pleases with code as long as it can be proven to have no detectable effect on any behavior of a conforming program that is guaranteed by the standard.)
A virtual function in C++ is defined (more or less, no need for extreme language lawyering) as executing based on the run-time type of an object; when called directly on an object this will always be the compile-time type of the object, so there is no polymorphism when a virtual function is called this way.
Note that this didn't necessarily have to be the case: object types with virtual functions are usually implemented in C++ with a per-object pointer to a table of virtual functions which is unique to each type. If so inclined, a compiler for some hypothetical variant of C++ could implement assignment on objects (such as Base b; b = Derived()) as copying both the contents of the object and the virtual table pointer along with it, which would easily work if both Base and Derived were the same size. In the case that the two were not the same size, the compiler could even insert code that pauses the program for an arbitrary amount of time in order to rearrange memory in the program and update all possible references to that memory in a way that could be proven to have no detectable effect on the semantics of the program, terminating the program if no such rearrangement could be found: this would be very inefficient, though, and could not be guaranteed to ever halt, obviously not desirable features for an assignment operator to have.
So in lieu of the above, polymorphism in C++ is accomplished by allowing references and pointers to objects to reference and point to objects of their declared compile-time types and any subtypes thereof. When a virtual function is called through a reference or pointer, and the compiler cannot prove that the object referenced or pointed to is of a run-time type with a specific known implementation of that virtual function, the compiler inserts code which looks up the correct virtual function to call a run-time. It did not have to be this way, either: references and pointers could have been defined as being non-polymorphic (disallowing them to reference or point to subtypes of their declared types) and forcing the programmer to come up with alternative ways of implementing polymorphism. The latter is clearly possible since it's done all the time in C, but at that point there's not much reason to have a new language at all.
In sum, the semantics of C++ are designed in such a way to allow the high-level abstraction and encapsulation of object-oriented polymorphism while still retaining features (like low-level access and explicit management of memory) which allow it to be suitable for low-level development. You could easily design a language that had some other semantics, but it would not be C++ and would have different benefits and drawbacks.
I found it helpful to understand that a copy constructor is invoked when assigning like this:
class Base { };
class Derived : public Base { };
Derived x; /* Derived type object created */
Base y = x; /* Copy is made (using Base's copy constructor), so y really is of type Base. Copy can cause "slicing" btw. */
Since y is an actual object of class Base, rather than the original one, functions called on this are Base's functions.
Consider little endian architectures: values are stored low-order-bytes first. So, for any given unsigned integer, the values 0-255 are stored in the first byte of the value. Accessing the low 8-bits of any value simply requires a pointer to it's address.
So we could implement uint8 as a class. We know that an instance of uint8 is ... one byte. If we derive from it and produce uint16, uint32, etc, the interface remains the same for purposes of abstraction, but the one most important change is size of the concrete instances of the object.
Of course, if we implemented uint8 and char, the sizes may be the same, likewise sint8.
However, operator= of uint8 and uint16 are going to move different quantities of data.
In order to create a Polymorphic function we must either be able to:
a/ receive the argument by value by copying the data into a new location of the correct size and layout,
b/ take a pointer to the object's location,
c/ take a reference to the object instance,
We can use templates to achieve a, so polymorphism can work without pointers and references, but if we are not counting templates, then lets consider what happens if we implement uint128 and pass it to a function expecting uint8? Answer: 8 bits get copied instead of 128.
So what if we made our polymorphic function accept uint128 and we passed it a uint8. If our uint8 we were copying was unfortunately located, our function would attempt to copy 128 bytes of which 127 were outside of our accessible memory -> crash.
Consider the following:
class A { int x; };
A fn(A a)
{
return a;
}
class B : public A {
uint64_t a, b, c;
B(int x_, uint64_t a_, uint64_t b_, uint64_t c_)
: A(x_), a(a_), b(b_), c(c_) {}
};
B b1 { 10, 1, 2, 3 };
B b2 = fn(b1);
// b2.x == 10, but a, b and c?
At the time fn was compiled, there was no knowledge of B. However, B is derived from A so polymorphism should allow that we can call fn with a B. However, the object it returns should be an A comprising a single int.
If we pass an instance of B to this function, what we get back should be just a { int x; } with no a, b, c.
This is "slicing".
Even with pointers and references we don't avoid this for free. Consider:
std::vector<A*> vec;
Elements of this vector could be pointers to A or something derived from A. The language generally solves this through the use of the "vtable", a small addition to the object's instance which identifies the type and provides function pointers for virtual functions. You can think of it as something like:
template<class T>
struct PolymorphicObject {
T::vtable* __vtptr;
T __instance;
};
Rather than every object having its own distinct vtable, classes have them, and object instances merely point to the relevant vtable.
The problem now is not slicing but type correctness:
struct A { virtual const char* fn() { return "A"; } };
struct B : public A { virtual const char* fn() { return "B"; } };
#include <iostream>
#include <cstring>
int main()
{
A* a = new A();
B* b = new B();
memcpy(a, b, sizeof(A));
std::cout << "sizeof A = " << sizeof(A)
<< " a->fn(): " << a->fn() << '\n';
}
http://ideone.com/G62Cn0
sizeof A = 4 a->fn(): B
What we should have done is use a->operator=(b)
http://ideone.com/Vym3Lp
but again, this is copying an A to an A and so slicing would occur:
struct A { int i; A(int i_) : i(i_) {} virtual const char* fn() { return "A"; } };
struct B : public A {
int j;
B(int i_) : A(i_), j(i_ + 10) {}
virtual const char* fn() { return "B"; }
};
#include <iostream>
#include <cstring>
int main()
{
A* a = new A(1);
B* b = new B(2);
*a = *b; // aka a->operator=(static_cast<A*>(*b));
std::cout << "sizeof A = " << sizeof(A)
<< ", a->i = " << a->i << ", a->fn(): " << a->fn() << '\n';
}
http://ideone.com/DHGwun
(i is copied, but B's j is lost)
The conclusion here is that pointers/references are required because the original instance carries membership information with it that copying may interact with.
But also, that polymorphism is not perfectly solved within C++ and one must be cognizant of their obligation to provide/block actions which could produce slicing.
You need pointers or reference because for the kind of polymorphism you are interested in (*), you need that the dynamic type could be different from the static type, in other words that the true type of the object is different than the declared type. In C++ that happens only with pointers or references.
(*) Genericity, the type of polymorphism provided by templates, doesn't need pointers nor references.
When an object is passed by value, it's typically put on the stack. Putting something on the stack requires knowledge of just how big it is. When using polymorphism, you know that the incoming object implements a particular set of features, but you usually have no idea the size of the object (nor should you, necessarily, that's part of the benefit). Thus, you can't put it on the stack. You do, however, always know the size of a pointer.
Now, not everything goes on the stack, and there are other extenuating circumstances. In the case of virtual methods, the pointer to the object is also a pointer to the object's vtable(s), which indicate where the methods are. This allows the compiler to find and call the functions, regardless of what object it's working with.
Another cause is that very often the object is implemented outside of the calling library, and allocated with a completely different (and possibly incompatible) memory manager. It could also have members that can't be copied, or would cause problems if they were copied with a different manager. There could be side-effects to copying and all sorts of other complications.
The result is that the pointer is the only bit of information on the object that you really properly understand, and provides enough information to figure out where the other bits you need are.

How does a copy of the data in one data structure to another work in C++?

I have a class that looks like:
class A
{
public:
A();
void MethodThatWillNotBeInOtherStructure();
inline int getData() { return data; }
inline void setData(int data_) { data = data_; }
private:
int data;
}
and a structure like:
struct B
{
public:
inline int getData() { return data; }
inline void setData(int data_) { data = data_; }
private:
int data;
}
How can I copy an instance of A to B without individually setting the fields? I know I can as I have seen code that would take a void* of say A and pass it to a function expecting B and it work. My big question also, is how does this work? I suppose it has something to do with memcpy, but I don't know how the memory layout for the structure and the class will be. For example, how do the functions that are in one but not the other not get in the way of the memcpy? Could someone explain this to me?
Update
Ok, let me explain. I am not saying I would ever do this in reusable code or that I would ever use it period. I still want to know how it works. Does a class have a different memory layout than a structure? How are the methods stored? Where is the data stored?
Thanks!
Copying two unrelated structures in to each other through void * would work properly only if the two structures have the same memory layout. Otherwise the copying fails.
Note that the objects of structure A and B in your above code will have the same memory layout, since there member variables are identical.
Copying through void * works because one is just copying the actual memory occupied by one structure object in to memory occupied by another structure object.
It is basically a bad idea to copy two unrelated structures in this way.
Consider the situation where you have pointers members inside your structure, a memcpy would just cause a shallow copy of the pointer members, And if one of the object finishes its lifetime then eventually, the other object will be left with a dangling pointer member. That would eventually lead to an Undefined behavior(most likely a crash).
How are the methods stored? Where is the data stored?
A normal function(non virtual) will be stored somewhere in the code section of the program. This location is the same for all instances of the class/structure and hence it is not a part of the memory allocation of each object of the class/structure.
In case of a virtual member function, the size of an class/structure does get affected due to presence of virtual functions, each object of the class/structure then has a special pointer called vptr inside each this. Note that this is implementation detail of compilers and compilers may choose to implement it differently.
Knowing the memory layout is quite important when you're using unsafe mechanisms like memcpy to copy the structure. Once it's modified later, you entire logic may screwed up.
Objects of a class doesn't contain the functions. The memory of an object contains only the attributes and the required size for it. On the other hand, functions are executable peice of code which is common across the program and will not influence the structure's memory layout.
I'd suggest you to define operator=, constructors to appropriately casting one object to another.
The additional glitch on memcpy is that, the object may contain virtual pointer if the class has virtual functions. Additional pointer data may also be copied to the destination memory; which is not really good!
You can copy the A to B using memcpy since they have the same member variables. The functions are not part the instances, so they don't matter.
I would recommend against this approach. If either A or B changes, then your copy will fail at run-time. You can make a a constructor of B which takes A, a conversion function, or something. Though it will require a little more code, it will allow for changes to the structures.
Your best bet is probably to implement constructor of struct B that takes a const reference to a struct A. Use that directly or use the assignment operator (which you probably will need to implement for non-trivial cases).
A a;
... (populate a)
B b(a); //If you want to set b to a at instantiation.
b = B(a); //If you want to overwrite an existing instance of struct B with a.
EDIT:
To respond to your edit, it is entirely compiler dependent. The fact that it works at all for a class is reliant on compiler details, since AFAIK it's not supported by the standard. The compiler devs could decide to mess with people by making it work for CLASS but not STRUCT (or vice versa).
That said, I've never seen any difference in any compiler I've used. I would expect them to map memory identically.

Managing C++ objects in a buffer, considering the alignment and memory layout assumptions

I am storing objects in a buffer. Now I know that I cannot make assumptions about the memory layout of the object.
If I know the overall size of the object, is it acceptible to create a pointer to this memory and call functions on it?
e.g. say I have the following class:
[int,int,int,int,char,padding*3bytes,unsigned short int*]
1)
if I know this class to be of size 24 and I know the address of where it starts in memory
whilst it is not safe to assume the memory layout is it acceptible to cast this to a pointer and call functions on this object which access these members?
(Does c++ know by some magic the correct position of a member?)
2)
If this is not safe/ok, is there any other way other than using a constructor which takes all of the arguments and pulling each argument out of the buffer one at a time?
Edit: Changed title to make it more appropriate to what I am asking.
You can create a constructor that takes all the members and assigns them, then use placement new.
class Foo
{
int a;int b;int c;int d;char e;unsigned short int*f;
public:
Foo(int A,int B,int C,int D,char E,unsigned short int*F) : a(A), b(B), c(C), d(D), e(E), f(F) {}
};
...
char *buf = new char[sizeof(Foo)]; //pre-allocated buffer
Foo *f = new (buf) Foo(a,b,c,d,e,f);
This has the advantage that even the v-table will be generated correctly. Note, however, if you are using this for serialization, the unsigned short int pointer is not going to point at anything useful when you deserialize it, unless you are very careful to use some sort of method to convert pointers into offsets and then back again.
Individual methods on a this pointer are statically linked and are simply a direct call to the function with this being the first parameter before the explicit parameters.
Member variables are referenced using an offset from the this pointer. If an object is laid out like this:
0: vtable
4: a
8: b
12: c
etc...
a will be accessed by dereferencing this + 4 bytes.
Basically what you are proposing doing is reading in a bunch of (hopefully not random) bytes, casting them to a known object, and then calling a class method on that object. It might actually work, because those bytes are going to end up in the "this" pointer in that class method. But you're taking a real chance on things not being where the compiled code expects it to be. And unlike Java or C#, there is no real "runtime" to catch these sorts of problems, so at best you'll get a core dump, and at worse you'll get corrupted memory.
It sounds like you want a C++ version of Java's serialization/deserialization. There is probably a library out there to do that.
Non-virtual function calls are linked directly just like a C function. The object (this) pointer is passed as the first argument. No knowledge of the object layout is required to call the function.
It sounds like you're not storing the objects themselves in a buffer, but rather the data from which they're comprised.
If this data is in memory in the order the fields are defined within your class (with proper padding for the platform) and your type is a POD, then you can memcpy the data from the buffer to a pointer to your type (or possibly cast it, but beware, there are some platform-specific gotchas with casts to pointers of different types).
If your class is not a POD, then the in-memory layout of fields is not guaranteed, and you shouldn't rely on any observed ordering, as it is allowed to change on each recompile.
You can, however, initialize a non-POD with data from a POD.
As far as the addresses where non-virtual functions are located: they are statically linked at compile time to some location within your code segment that is the same for every instance of your type. Note that there is no "runtime" involved. When you write code like this:
class Foo{
int a;
int b;
public:
void DoSomething(int x);
};
void Foo::DoSomething(int x){a = x * 2; b = x + a;}
int main(){
Foo f;
f.DoSomething(42);
return 0;
}
the compiler generates code that does something like this:
function main:
allocate 8 bytes on stack for object "f"
call default initializer for class "Foo" (does nothing in this case)
push argument value 42 onto stack
push pointer to object "f" onto stack
make call to function Foo_i_DoSomething#4 (actual name is usually more complex)
load return value 0 into accumulator register
return to caller
function Foo_i_DoSomething#4 (located elsewhere in the code segment)
load "x" value from stack (pushed on by caller)
multiply by 2
load "this" pointer from stack (pushed on by caller)
calculate offset of field "a" within a Foo object
add calculated offset to this pointer, loaded in step 3
store product, calculated in step 2, to offset calculated in step 5
load "x" value from stack, again
load "this" pointer from stack, again
calculate offset of field "a" within a Foo object, again
add calculated offset to this pointer, loaded in step 8
load "a" value stored at offset,
add "a" value, loaded int step 12, to "x" value loaded in step 7
load "this" pointer from stack, again
calculate offset of field "b" within a Foo object
add calculated offset to this pointer, loaded in step 14
store sum, calculated in step 13, to offset calculated in step 16
return to caller
In other words, it would be more or less the same code as if you had written this (specifics, such as name of DoSomething function and method of passing this pointer are up to the compiler):
class Foo{
int a;
int b;
friend void Foo_DoSomething(Foo *f, int x);
};
void Foo_DoSomething(Foo *f, int x){
f->a = x * 2;
f->b = x + f->a;
}
int main(){
Foo f;
Foo_DoSomething(&f, 42);
return 0;
}
A object having POD type, in this case, is already created (Whether or not you call new. Allocating the required storage already suffices), and you can access the members of it, including calling a function on that object. But that will only work if you precisely know the required alignment of T, and the size of T (the buffer may not be smaller than it), and the alignment of all the members of T. Even for a pod type, the compiler is allowed to put padding bytes between members, if it wants. For a non-POD types, you can have the same luck if your type has no virtual functions or base classes, no user defined constructor (of course) and that applies to the base and all its non-static members too.
For all other types, all bets are off. You have to read values out first with a POD, and then initialize a non-POD type with that data.
I am storing objects in a buffer. ... If I know the overall size of the object, is it acceptable to create a pointer to this memory and call functions on it?
This is acceptable to the extent that using casts is acceptable:
#include <iostream>
namespace {
class A {
int i;
int j;
public:
int value()
{
return i + j;
}
};
}
int main()
{
char buffer[] = { 1, 2 };
std::cout << reinterpret_cast<A*>(buffer)->value() << '\n';
}
Casting an object to something like raw memory and back again is actually pretty common, especially in the C world. If you're using a class hierarchy, though, it would make more sense to use pointer to member functions.
say I have the following class: ...
if I know this class to be of size 24 and I know the address of where it starts in memory ...
This is where things get difficult. The size of an object includes the size of its data members (and any data members from any base classes) plus any padding plus any function pointers or implementation-dependent information, minus anything saved from certain size optimizations (empty base class optimization). If the resulting number is 0 bytes, then the object is required to take at least one byte in memory. These things are a combination of language issues and common requirements that most CPUs have regarding memory accesses. Trying to get things to work properly can be a real pain.
If you just allocate an object and cast to and from raw memory you can ignore these issues. But if you copy an object's internals to a buffer of some sort, then they rear their head pretty quickly. The code above relies on a few general rules about alignment (i.e., I happen to know that class A will have the same alignment restrictions as ints, and thus the array can be safely cast to an A; but I couldn't necessarily guarantee the same if I were casting parts of the array to A's and parts to other classes with other data members).
Oh, and when copying objects you need to make sure you're properly handling pointers.
You may also be interested in things like Google's Protocol Buffers or Facebook's Thrift.
Yes these issues are difficult. And, yes, some programming languages sweep them under the rug. But there's an awful lot of stuff getting swept under the rug:
In Sun's HotSpot JVM, object storage is aligned to the nearest 64-bit boundary. On top of this, every object has a 2-word header in memory. The JVM's word size is usually the platform's native pointer size. (An object consisting of only a 32-bit int and a 64-bit double -- 96 bits of data -- will require) two words for the object header, one word for the int, two words for the double. That's 5 words: 160 bits. Because of the alignment, this object will occupy 192 bits of memory.
This is because Sun is relying on a relatively simple tactic for memory alignment issues (on an imaginary processor, a char may be allowed to exist at any memory location, an int at any location that is divisible by 4, and a double may need to be allocated only on memory locations that are divisible by 32 -- but the most restrictive alignment requirement also satisfies every other alignment requirement, so Sun is aligning everything according to the most restrictive location).
Another tactic for memory alignment can reclaim some of that space.
If the class contains no virtual functions (and therefore class instances have no vptr), and if you make correct assumptions about the way in which the class' member data is laid out in memory, then doing what you're suggesting might work (but might not be portable).
Yes, another way (more idiomatic but not much safer ... you still need to know how the class lays out its data) would be to use the so-called "placement operator new" and a default constructor.
That depends upon what you mean by "safe". Any time you cast a memory address into a point in this way you are bypassing the type safety features provided by the compiler, and taking the responsibility to yourself. If, as Chris implies, you make an incorrect assumption about the memory layout, or compiler implementation details, then you will get unexpected results and loose portability.
Since you are concerned about the "safety" of this programming style it is likely worth your while to investigate portable and type-safe methods such as pre-existing libraries, or writing a constructor or assignment operator for the purpose.