Suppose we have a class:
class Foo
{
private:
int a;
public:
void func()
{
a = 0;
printf("In Func\n");
}
}
int main()
{
Foo *foo = new Foo();
foo->func();
return 0;
}
When the object of the class Foo is created and initialized, I understand that integer a will take up 4 bytes of memory. How is the function stored? What happens in memory / stack /registers / with the program counter when calling foo->func()?
The short answer: It will be stored in the text or code section of the binary only once irrespective of the number of instances of the class created.
The functions are not stored separately anywhere for each instance of a class. They are treated the same way as any other non-member function would be. The only difference is that the compiler actually adds an extra parameter to the function,which is a pointer of the class type.
For example the compiler will generate the function prototype like this:
void func(Foo* this);
(Note that this may not be the final signature. The final signature can be much more cryptic depending on various factors including the compiler)
Any reference to a member variable will be replaced by
this-><member> //for your ex. a=0 translates to this->a = 0;
So the line foo->func(); roughly translates to:
Push the value of Foo* on to the stack. #Compiler dependent
call func which will cause the instruction pointer to jump to the offset of func in the executable #architecture dependent Read this and this
Func will pop the value from stack. Any further reference to a member variable would be preceded by dereferencing of this value
Your function is not virtual, it is thus statically called : the compiler inserts a jump to the code segment corresponding to your function. No additional memory is used per instance.
Were your function virtual, your instance would carry a vpointer, which would be dereferenced to find its class' vtable, which would then be indexed to find the function pointer to be called, and finally jump there. The additional cost is thus one vtable per class (probably the size of one function pointer, times the number of virtuals functions of your class), and one pointer per instance.
Note that this is a common implementation of virtual calls, but is in no way enforced by the standard, so it could actually not be implemented like that at all but your chances are quite good. The compiler can also often bypass the virtual call system altogether if it has knowledge of the static type of your instance at compile time.
Member functions are just like regular functions, they are stored in the "code" or "text" section. There is one thing special with (non-static) member functions, and that is the "hidden" this argument that is passed along to function. So in your case, the address in foo will be passed to func.
Exactly how that argument is passed, and what happens to registers and stack is defined by the ABI (Application Binary Interface), and varies from processor to processor. There is no strict definition for this, unless you tell us what the combination of compiler, OS and processor is being used (and assuming that information is then publicly available - not all compiler/OS vendors will tell this very clearly). As an example, x86-64 will use RCX for this on WIndows, and RDI on Linux, and the call instruction will automatically push the return address onto the stack. On an ARM processor [in Linux, but I think the same applies in Windows, I just have never looked at that], R0 is used for the this pointer, and the BX instruction used for the call, which as part of itself stores lr with the pc of the instruction to return to. lr then has to be saved [probably on the stack] in func, since it calls printf.
Related
I was experimenting with C++ and found the below code as very strange.
class Foo{
public:
virtual void say_virtual_hi(){
std::cout << "Virtual Hi";
}
void say_hi()
{
std::cout << "Hi";
}
};
int main(int argc, char** argv)
{
Foo* foo = 0;
foo->say_hi(); // works well
foo->say_virtual_hi(); // will crash the app
return 0;
}
I know that the virtual method call crashes because it requires a vtable lookup and can only work with valid objects.
I have the following questions
How does the non virtual method say_hi work on a NULL pointer?
Where does the object foo get allocated?
Any thoughts?
The object foo is a local variable with type Foo*. That variable likely gets allocated on the stack for the main function, just like any other local variable. But the value stored in foo is a null pointer. It doesn't point anywhere. There is no instance of type Foo represented anywhere.
To call a virtual function, the caller needs to know which object the function is being called on. That's because the object itself is what tells which function should really be called. (That's frequently implemented by giving the object a pointer to a vtable, a list of function-pointers, and the caller just knows it's supposed to call the first function on the list, without knowing in advance where that pointer points.)
But to call a non-virtual function, the caller doesn't need to know all that. The compiler knows exactly which function will get called, so it can generate a CALL machine-code instruction to go directly to the desired function. It simply passes a pointer to the object the function was called on as a hidden parameter to the function. In other words, the compiler translates your function call into this:
void Foo_say_hi(Foo* this);
Foo_say_hi(foo);
Now, since the implementation of that function never makes reference to any members of the object pointed to by its this argument, you effectively dodge the bullet of dereferencing a null pointer because you never dereference one.
Formally, calling any function — even a non-virtual one — on a null pointer is undefined behavior. One of the allowed results of undefined behavior is that your code appears to run exactly as you intended. You shouldn't rely on that, although you will sometimes find libraries from your compiler vendor that do rely on that. But the compiler vendor has the advantage of being able to add further definition to what would otherwise be undefined behavior. Don't do it yourself.
The say_hi() member function is usually implemented by the compiler as
void say_hi(Foo *this);
Since you don't access any members, your call succeeds (even though you are entering undefined behaviour according to the standard).
Foo doesn't get allocated at all.
Dereferencing a NULL pointer causes "undefined behaviour", This means that anything could happen - your code may even appear to work correctly. You must not depend on this however - if you run the same code on a different platform (or even possibly on the same platform) it will probably crash.
In your code there is no Foo object, only a pointer which is initalised with the value NULL.
It is undefined behaviour, but most compilers generate instructions which will handle this situation correctly if you access neither member variables nor the virtual table.
Let's see the disassembly generated by Visual Studio to understand what happens:
Foo* foo = 0;
004114BE mov dword ptr [foo],0
foo->say_hi(); // works well
004114C5 mov ecx,dword ptr [foo]
004114C8 call Foo::say_hi (411091h)
foo->say_virtual_hi(); // will crash the app
004114CD mov eax,dword ptr [foo]
004114D0 mov edx,dword ptr [eax]
004114D2 mov esi,esp
004114D4 mov ecx,dword ptr [foo]
004114D7 mov eax,dword ptr [edx]
004114D9 call eax
As you can see Foo:say_hi is called as a normal function, but with this in the ecx register. To simplify you can assume that this is passed as implicit parameter, which is never used in your example.
But in the second case the address of the function must be calculated, due to the virtual table - this requires that the address of foo is valid and causes a crash.
a) It works because it does not dereference anything through the implicit "this" pointer. As soon as you do that, boom. I'm not 100% sure, but I think null pointer dereferences are done by RW protecting first 1K of memory space, so there is a small chance of nullreferencing not getting caught if you only dereference it past 1K line (ie. some instance variable that would get allocated very far, like:
class A {
char foo[2048];
int i;
}
then a->i would possibly be uncaught when A is null.
b) Nowhere, you only declared a pointer, which is allocated on main():s stack.
It's important to realize that both calls produce undefined behavior, and that behavior may manifest in unexpected ways. Even if the call appears to work, it may be laying down a minefield.
Consider this small change to your example:
Foo* foo = 0;
foo->say_hi(); // appears to work
if (foo != 0)
foo->say_virtual_hi(); // why does it still crash?
Since the first call to foo enables undefined behavior if foo is null, the compiler is now free to assume that foo is not null. That makes the if (foo != 0) redundant, and the compiler can optimize it out! You might think this is a very senseless optimization, but the compiler writers have been getting very aggressive, and something like this has happened in actual code.
The call to say_hi is statically bound. So the computer actually simply does a standard call to a function. The function doesn't use any fields, so there is no problem.
The call to virtual_say_hi is dynamically bound, so the processor goes to the virtual table, and since there is no virtual table there, it jumps somewhere random and crashes the program.
In the original days of C++, the C++ code was converted to C. Object methods are converted to non-object methods like this (in your case):
foo_say_hi(Foo* thisPtr, /* other args */)
{
}
Of course, the name foo_say_hi is simplified. For more details, look up C++ name mangling.
As you can see, if the thisPtr is never dereferenced, then the code is fine and succeeds. In your case, no instance variables or anything that depends on the thisPtr was used.
However, virtual functions are different. There's a lot of object lookups to make sure the right object pointer is passed as the paramter to the function. This will dereference the thisPtr and cause the exception.
I was experimenting with C++ and found the below code as very strange.
class Foo{
public:
virtual void say_virtual_hi(){
std::cout << "Virtual Hi";
}
void say_hi()
{
std::cout << "Hi";
}
};
int main(int argc, char** argv)
{
Foo* foo = 0;
foo->say_hi(); // works well
foo->say_virtual_hi(); // will crash the app
return 0;
}
I know that the virtual method call crashes because it requires a vtable lookup and can only work with valid objects.
I have the following questions
How does the non virtual method say_hi work on a NULL pointer?
Where does the object foo get allocated?
Any thoughts?
The object foo is a local variable with type Foo*. That variable likely gets allocated on the stack for the main function, just like any other local variable. But the value stored in foo is a null pointer. It doesn't point anywhere. There is no instance of type Foo represented anywhere.
To call a virtual function, the caller needs to know which object the function is being called on. That's because the object itself is what tells which function should really be called. (That's frequently implemented by giving the object a pointer to a vtable, a list of function-pointers, and the caller just knows it's supposed to call the first function on the list, without knowing in advance where that pointer points.)
But to call a non-virtual function, the caller doesn't need to know all that. The compiler knows exactly which function will get called, so it can generate a CALL machine-code instruction to go directly to the desired function. It simply passes a pointer to the object the function was called on as a hidden parameter to the function. In other words, the compiler translates your function call into this:
void Foo_say_hi(Foo* this);
Foo_say_hi(foo);
Now, since the implementation of that function never makes reference to any members of the object pointed to by its this argument, you effectively dodge the bullet of dereferencing a null pointer because you never dereference one.
Formally, calling any function — even a non-virtual one — on a null pointer is undefined behavior. One of the allowed results of undefined behavior is that your code appears to run exactly as you intended. You shouldn't rely on that, although you will sometimes find libraries from your compiler vendor that do rely on that. But the compiler vendor has the advantage of being able to add further definition to what would otherwise be undefined behavior. Don't do it yourself.
The say_hi() member function is usually implemented by the compiler as
void say_hi(Foo *this);
Since you don't access any members, your call succeeds (even though you are entering undefined behaviour according to the standard).
Foo doesn't get allocated at all.
Dereferencing a NULL pointer causes "undefined behaviour", This means that anything could happen - your code may even appear to work correctly. You must not depend on this however - if you run the same code on a different platform (or even possibly on the same platform) it will probably crash.
In your code there is no Foo object, only a pointer which is initalised with the value NULL.
It is undefined behaviour, but most compilers generate instructions which will handle this situation correctly if you access neither member variables nor the virtual table.
Let's see the disassembly generated by Visual Studio to understand what happens:
Foo* foo = 0;
004114BE mov dword ptr [foo],0
foo->say_hi(); // works well
004114C5 mov ecx,dword ptr [foo]
004114C8 call Foo::say_hi (411091h)
foo->say_virtual_hi(); // will crash the app
004114CD mov eax,dword ptr [foo]
004114D0 mov edx,dword ptr [eax]
004114D2 mov esi,esp
004114D4 mov ecx,dword ptr [foo]
004114D7 mov eax,dword ptr [edx]
004114D9 call eax
As you can see Foo:say_hi is called as a normal function, but with this in the ecx register. To simplify you can assume that this is passed as implicit parameter, which is never used in your example.
But in the second case the address of the function must be calculated, due to the virtual table - this requires that the address of foo is valid and causes a crash.
a) It works because it does not dereference anything through the implicit "this" pointer. As soon as you do that, boom. I'm not 100% sure, but I think null pointer dereferences are done by RW protecting first 1K of memory space, so there is a small chance of nullreferencing not getting caught if you only dereference it past 1K line (ie. some instance variable that would get allocated very far, like:
class A {
char foo[2048];
int i;
}
then a->i would possibly be uncaught when A is null.
b) Nowhere, you only declared a pointer, which is allocated on main():s stack.
It's important to realize that both calls produce undefined behavior, and that behavior may manifest in unexpected ways. Even if the call appears to work, it may be laying down a minefield.
Consider this small change to your example:
Foo* foo = 0;
foo->say_hi(); // appears to work
if (foo != 0)
foo->say_virtual_hi(); // why does it still crash?
Since the first call to foo enables undefined behavior if foo is null, the compiler is now free to assume that foo is not null. That makes the if (foo != 0) redundant, and the compiler can optimize it out! You might think this is a very senseless optimization, but the compiler writers have been getting very aggressive, and something like this has happened in actual code.
The call to say_hi is statically bound. So the computer actually simply does a standard call to a function. The function doesn't use any fields, so there is no problem.
The call to virtual_say_hi is dynamically bound, so the processor goes to the virtual table, and since there is no virtual table there, it jumps somewhere random and crashes the program.
In the original days of C++, the C++ code was converted to C. Object methods are converted to non-object methods like this (in your case):
foo_say_hi(Foo* thisPtr, /* other args */)
{
}
Of course, the name foo_say_hi is simplified. For more details, look up C++ name mangling.
As you can see, if the thisPtr is never dereferenced, then the code is fine and succeeds. In your case, no instance variables or anything that depends on the thisPtr was used.
However, virtual functions are different. There's a lot of object lookups to make sure the right object pointer is passed as the paramter to the function. This will dereference the thisPtr and cause the exception.
Where exactly is the 'this' pointer stored in memory? Is it allocated on the stack, in the heap, or in the data segment?
#include <iostream>
using namespace std;
class ClassA
{
int a, b;
public:
void add()
{
a = 10;
b = 20;
cout << a << b << endl;
}
};
int main()
{
ClassA obj;
obj.add();
return 0;
}
In the above code I am calling the member function add() and the receiver object is passed implicitly as the 'this' pointer. Where is this stored in memory?
The easiest way is to think of this as being a hidden extra argument that is always passed automatically.
So, a fictional method like:
size_t String::length(void) const
{
return strlen(m_string);
}
is actually more like this under the hood:
size_t String__length(const String *this)
{
return strlen(this->m_string);
}
and a call like:
{
String example("hello");
cout << example.length();
}
becomes something like:
cout << String__length(&example);
Note that the above transformation is simplified, hopefully to make my point a bit clearer. No need to fill up the comments with "whaaa, where's the marshalling for method overloading, huh?"-type objection, please. :)
That transforms the question into "where are arguments stored?", and the answer is of course "it depends". :)
It's often on the stack, but it could be in registers too, or any other mechanism that the compiler considers is good for the target architecture.
Other answers have done a very good job explaining how a typical compiler implements this (by passing it as an implicit first parameter to the function).
I think it's also useful to see what the C++ ISO spec explicitly says about this. According to the C++03 ISO spec, §9.3.2/1:
In the body of a nonstatic (9.3) member function, the keyword this is a non-lvalue expression whose value is the address of the object for which the function is called.
It's important to note that this is not a variable - it's an expression, much in the same way that the expression 1 + 2 * 3 is an expression. The value of this expression is permitted to be stored pretty much anywhere. The compiler might put it on the stack and pass it as an implicit parameter to a function, or it might put it in a register, and it conceivably could put it in the heap or in the data segment. The C++ specification deliberately gives the implementation some flexibility here.
I think that the "language-lawyer" answer is "this is completely implementation-defined, and moreover this is technically not a pointer, but an expression that evaluates to a pointer."
Hope this helps!
this is usually passed as a hidden argument of the method (the only difference throughout different calling conventions is how).
If you call:
myClass.Method(1, 2, 3);
Compiler generates the following code:
Method(&myClass, 1, 2, 3);
Where the first parameter is actually the pointer to this.
Let's check the following code:
class MyClass
{
private:
int a;
public:
void __stdcall Method(int i)
{
a = i;
}
};
int main(int argc, char *argv[])
{
MyClass myClass;
myClass.Method(5);
return 0;
}
By using __stdcall I forced the compiler to pass all parameters through the stack. If you then start the debugger and inspect the assembly code, you'll find something like the following:
myClass.Method(5);
00AA31BE push 5
00AA31C0 lea eax,[myClass]
00AA31C3 push eax
00AA31C4 call MyClass::Method (0AA1447h)
As you see, the parameter of the method is passed through the stack, then address of myClass is loaded to eax register and again pushed on the stack. In other words, this is treated as a regular parameter of this method.
this is an rvalue (you cannot take its address), so it doesn't
(necessarily) occupy memory at all. Depending on the compiler
and the target architecture, it will often be in a register: i0
on a Sparc, ECX with MSVC on Intel, etc. When the optimizer is
active, it can even move around. (I've seen it in different
registers with MSVC).
this behaves mostly like a function argument, and as such will be stored on the stack or - if the binary calling conventions of the architecture allow that - in a register.
this isn't stored at a well-defined location! The object that it points to is stored somewhere, and has a well-defined address, but the address itself does not have a specific home address. It is communicated around in the program. Not only that, but there can be many copies of that pointer.
In the following imaginary init function, the object registers itself to receive events and timer callbacks (using imaginary event source objects). So after the registration, there are two additional copies of this:
void foo_listener::init()
{
g_usb_events.register(this); // register to receive USB events
g_timer.register(this, 5); // register for a 5 second timer
}
I a function activation chain, there will also be multiple copies of the this pointer. Suppose we have an object obj and call its foo function. That function calls the same object's bar function, and bar calls another function called update. Each function activation level has the this pointer. It's stored in a machine register, or in a memory location in the stack frame of the function activation.
i want to see the content of the vtable of the class A, especially the virtual desctructor, but i can not call it through a function pointer.
Here is my code:
typedef void (*fun)();
class A {
public:
virtual func() {printf("A::func() is called\n");}
virtual ~A() {printf("A::~A() is called\n");}
};
//enter in the vtable
void *getvtable (void* p, int off){
return (void*)*((unsigned int*)p+off);
}
//off_obj is used for multiple inherence(so not here), off_vtable is used to specify the position of function in vtable
fun getfun (A* obj, unsigned int off_obj,int off_vtable){
void *vptr = getvtable(obj,off_obj);
unsigned char *p = (unsigned char *)vptr;
p += sizeof(void*) * off_vtable;
return (fun)getvtable(p,0);
}
void main() {
A* ptr_a = new A;
fun pfunc = getfun(ptr_a,0,0);
(*pfunc)();
pfunc = getfun(ptr_a,0,1);
(*pfunc)(); //error occurred here, this is supposed to be the virtual desctrutor, why?
}
Let's suppose for the sake of argument that the vtable in question really is laid out the way you think it is, as a table of ordinary memory addresses, and that when casting those addresses to function pointers, they're callable.
You have at least two problems:
The calling convention for the member functions isn't necessarily the same as for ordinary functions. Microsoft's default calling convention is thiscall, which places a pointer to the object whose method is being called in the ECX register. There's no facility for specifying that manually; the only way to make that happen is by calling a member function in the way member functions are called, which involves syntax like obj.f() or pobj->f(). You can't do that with pointers to functions (not even member-function pointers), unless you write machine code or assembler to get all the low-level details right.
You happen not to hit this problem for func because it doesn't make reference to this (either directly or by implicit reference to other members). The destructor does, though. Destructors are special, and what's actually stored in the vtable is a pointer to a compiler-generated helper function that calls the real destructor and then checks some flags passed as a hidden parameter to determine whether it should free the object's memory. The value that happens to be in ECX doesn't matter for the func call, but it's very important to be right for the ~A call.
Destructors aren't like normal functions. As I mentioned above, the compiler can generate one or more helper functions, and they receive parameters in addition to this. You haven't accounted for that in your code. The compiler generates separate helpers for array and non-array destructors, so right now we don't even know which one you found at index 1 of the vtable. But since you didn't pass it a valid flag parameter, and there's no way to pass it the this value, it doesn't really matter what you find in the vtable anyway.
You can attempt to solve the first problem by specifying a different calling convention, like stdcall. That puts the this parameter back on the stack with the rest of the parameters, and that allows you to pass it when you call the function pointer. For func, fun would need to have a declaration like this:
typedef void (__stdcall * fun)(A*);
Invoke pfunc like this:
pfunc(ptr_a);
To solve the second problem, you'll need to determine the actual order of the vtable functions so you know to find the right destructor helper. And to call it, you'd need a different function-pointer declaration, too. Destructors don't technically have a return type, but void works well enough. You could use something like this:
typedef void (__stdcall * destr)(A*, unsigned flags);
For most of this answer, I've used an article by Igorsk about recognizing certain patterns in a program for the purpose of decompiling it back into C++. Part 2 covers classes.
You don't call the destructor. You call operator delete(), and it figures out the destructor. Calling destructors directly is Undefined Behavior, in the same sense that dereferencing NULL is, i.e. blows up on every platform I've seen.
I am storing objects in a buffer. Now I know that I cannot make assumptions about the memory layout of the object.
If I know the overall size of the object, is it acceptible to create a pointer to this memory and call functions on it?
e.g. say I have the following class:
[int,int,int,int,char,padding*3bytes,unsigned short int*]
1)
if I know this class to be of size 24 and I know the address of where it starts in memory
whilst it is not safe to assume the memory layout is it acceptible to cast this to a pointer and call functions on this object which access these members?
(Does c++ know by some magic the correct position of a member?)
2)
If this is not safe/ok, is there any other way other than using a constructor which takes all of the arguments and pulling each argument out of the buffer one at a time?
Edit: Changed title to make it more appropriate to what I am asking.
You can create a constructor that takes all the members and assigns them, then use placement new.
class Foo
{
int a;int b;int c;int d;char e;unsigned short int*f;
public:
Foo(int A,int B,int C,int D,char E,unsigned short int*F) : a(A), b(B), c(C), d(D), e(E), f(F) {}
};
...
char *buf = new char[sizeof(Foo)]; //pre-allocated buffer
Foo *f = new (buf) Foo(a,b,c,d,e,f);
This has the advantage that even the v-table will be generated correctly. Note, however, if you are using this for serialization, the unsigned short int pointer is not going to point at anything useful when you deserialize it, unless you are very careful to use some sort of method to convert pointers into offsets and then back again.
Individual methods on a this pointer are statically linked and are simply a direct call to the function with this being the first parameter before the explicit parameters.
Member variables are referenced using an offset from the this pointer. If an object is laid out like this:
0: vtable
4: a
8: b
12: c
etc...
a will be accessed by dereferencing this + 4 bytes.
Basically what you are proposing doing is reading in a bunch of (hopefully not random) bytes, casting them to a known object, and then calling a class method on that object. It might actually work, because those bytes are going to end up in the "this" pointer in that class method. But you're taking a real chance on things not being where the compiled code expects it to be. And unlike Java or C#, there is no real "runtime" to catch these sorts of problems, so at best you'll get a core dump, and at worse you'll get corrupted memory.
It sounds like you want a C++ version of Java's serialization/deserialization. There is probably a library out there to do that.
Non-virtual function calls are linked directly just like a C function. The object (this) pointer is passed as the first argument. No knowledge of the object layout is required to call the function.
It sounds like you're not storing the objects themselves in a buffer, but rather the data from which they're comprised.
If this data is in memory in the order the fields are defined within your class (with proper padding for the platform) and your type is a POD, then you can memcpy the data from the buffer to a pointer to your type (or possibly cast it, but beware, there are some platform-specific gotchas with casts to pointers of different types).
If your class is not a POD, then the in-memory layout of fields is not guaranteed, and you shouldn't rely on any observed ordering, as it is allowed to change on each recompile.
You can, however, initialize a non-POD with data from a POD.
As far as the addresses where non-virtual functions are located: they are statically linked at compile time to some location within your code segment that is the same for every instance of your type. Note that there is no "runtime" involved. When you write code like this:
class Foo{
int a;
int b;
public:
void DoSomething(int x);
};
void Foo::DoSomething(int x){a = x * 2; b = x + a;}
int main(){
Foo f;
f.DoSomething(42);
return 0;
}
the compiler generates code that does something like this:
function main:
allocate 8 bytes on stack for object "f"
call default initializer for class "Foo" (does nothing in this case)
push argument value 42 onto stack
push pointer to object "f" onto stack
make call to function Foo_i_DoSomething#4 (actual name is usually more complex)
load return value 0 into accumulator register
return to caller
function Foo_i_DoSomething#4 (located elsewhere in the code segment)
load "x" value from stack (pushed on by caller)
multiply by 2
load "this" pointer from stack (pushed on by caller)
calculate offset of field "a" within a Foo object
add calculated offset to this pointer, loaded in step 3
store product, calculated in step 2, to offset calculated in step 5
load "x" value from stack, again
load "this" pointer from stack, again
calculate offset of field "a" within a Foo object, again
add calculated offset to this pointer, loaded in step 8
load "a" value stored at offset,
add "a" value, loaded int step 12, to "x" value loaded in step 7
load "this" pointer from stack, again
calculate offset of field "b" within a Foo object
add calculated offset to this pointer, loaded in step 14
store sum, calculated in step 13, to offset calculated in step 16
return to caller
In other words, it would be more or less the same code as if you had written this (specifics, such as name of DoSomething function and method of passing this pointer are up to the compiler):
class Foo{
int a;
int b;
friend void Foo_DoSomething(Foo *f, int x);
};
void Foo_DoSomething(Foo *f, int x){
f->a = x * 2;
f->b = x + f->a;
}
int main(){
Foo f;
Foo_DoSomething(&f, 42);
return 0;
}
A object having POD type, in this case, is already created (Whether or not you call new. Allocating the required storage already suffices), and you can access the members of it, including calling a function on that object. But that will only work if you precisely know the required alignment of T, and the size of T (the buffer may not be smaller than it), and the alignment of all the members of T. Even for a pod type, the compiler is allowed to put padding bytes between members, if it wants. For a non-POD types, you can have the same luck if your type has no virtual functions or base classes, no user defined constructor (of course) and that applies to the base and all its non-static members too.
For all other types, all bets are off. You have to read values out first with a POD, and then initialize a non-POD type with that data.
I am storing objects in a buffer. ... If I know the overall size of the object, is it acceptable to create a pointer to this memory and call functions on it?
This is acceptable to the extent that using casts is acceptable:
#include <iostream>
namespace {
class A {
int i;
int j;
public:
int value()
{
return i + j;
}
};
}
int main()
{
char buffer[] = { 1, 2 };
std::cout << reinterpret_cast<A*>(buffer)->value() << '\n';
}
Casting an object to something like raw memory and back again is actually pretty common, especially in the C world. If you're using a class hierarchy, though, it would make more sense to use pointer to member functions.
say I have the following class: ...
if I know this class to be of size 24 and I know the address of where it starts in memory ...
This is where things get difficult. The size of an object includes the size of its data members (and any data members from any base classes) plus any padding plus any function pointers or implementation-dependent information, minus anything saved from certain size optimizations (empty base class optimization). If the resulting number is 0 bytes, then the object is required to take at least one byte in memory. These things are a combination of language issues and common requirements that most CPUs have regarding memory accesses. Trying to get things to work properly can be a real pain.
If you just allocate an object and cast to and from raw memory you can ignore these issues. But if you copy an object's internals to a buffer of some sort, then they rear their head pretty quickly. The code above relies on a few general rules about alignment (i.e., I happen to know that class A will have the same alignment restrictions as ints, and thus the array can be safely cast to an A; but I couldn't necessarily guarantee the same if I were casting parts of the array to A's and parts to other classes with other data members).
Oh, and when copying objects you need to make sure you're properly handling pointers.
You may also be interested in things like Google's Protocol Buffers or Facebook's Thrift.
Yes these issues are difficult. And, yes, some programming languages sweep them under the rug. But there's an awful lot of stuff getting swept under the rug:
In Sun's HotSpot JVM, object storage is aligned to the nearest 64-bit boundary. On top of this, every object has a 2-word header in memory. The JVM's word size is usually the platform's native pointer size. (An object consisting of only a 32-bit int and a 64-bit double -- 96 bits of data -- will require) two words for the object header, one word for the int, two words for the double. That's 5 words: 160 bits. Because of the alignment, this object will occupy 192 bits of memory.
This is because Sun is relying on a relatively simple tactic for memory alignment issues (on an imaginary processor, a char may be allowed to exist at any memory location, an int at any location that is divisible by 4, and a double may need to be allocated only on memory locations that are divisible by 32 -- but the most restrictive alignment requirement also satisfies every other alignment requirement, so Sun is aligning everything according to the most restrictive location).
Another tactic for memory alignment can reclaim some of that space.
If the class contains no virtual functions (and therefore class instances have no vptr), and if you make correct assumptions about the way in which the class' member data is laid out in memory, then doing what you're suggesting might work (but might not be portable).
Yes, another way (more idiomatic but not much safer ... you still need to know how the class lays out its data) would be to use the so-called "placement operator new" and a default constructor.
That depends upon what you mean by "safe". Any time you cast a memory address into a point in this way you are bypassing the type safety features provided by the compiler, and taking the responsibility to yourself. If, as Chris implies, you make an incorrect assumption about the memory layout, or compiler implementation details, then you will get unexpected results and loose portability.
Since you are concerned about the "safety" of this programming style it is likely worth your while to investigate portable and type-safe methods such as pre-existing libraries, or writing a constructor or assignment operator for the purpose.