Where exactly is the 'this' pointer stored in memory? Is it allocated on the stack, in the heap, or in the data segment?
#include <iostream>
using namespace std;
class ClassA
{
int a, b;
public:
void add()
{
a = 10;
b = 20;
cout << a << b << endl;
}
};
int main()
{
ClassA obj;
obj.add();
return 0;
}
In the above code I am calling the member function add() and the receiver object is passed implicitly as the 'this' pointer. Where is this stored in memory?
The easiest way is to think of this as being a hidden extra argument that is always passed automatically.
So, a fictional method like:
size_t String::length(void) const
{
return strlen(m_string);
}
is actually more like this under the hood:
size_t String__length(const String *this)
{
return strlen(this->m_string);
}
and a call like:
{
String example("hello");
cout << example.length();
}
becomes something like:
cout << String__length(&example);
Note that the above transformation is simplified, hopefully to make my point a bit clearer. No need to fill up the comments with "whaaa, where's the marshalling for method overloading, huh?"-type objection, please. :)
That transforms the question into "where are arguments stored?", and the answer is of course "it depends". :)
It's often on the stack, but it could be in registers too, or any other mechanism that the compiler considers is good for the target architecture.
Other answers have done a very good job explaining how a typical compiler implements this (by passing it as an implicit first parameter to the function).
I think it's also useful to see what the C++ ISO spec explicitly says about this. According to the C++03 ISO spec, ยง9.3.2/1:
In the body of a nonstatic (9.3) member function, the keyword this is a non-lvalue expression whose value is the address of the object for which the function is called.
It's important to note that this is not a variable - it's an expression, much in the same way that the expression 1 + 2 * 3 is an expression. The value of this expression is permitted to be stored pretty much anywhere. The compiler might put it on the stack and pass it as an implicit parameter to a function, or it might put it in a register, and it conceivably could put it in the heap or in the data segment. The C++ specification deliberately gives the implementation some flexibility here.
I think that the "language-lawyer" answer is "this is completely implementation-defined, and moreover this is technically not a pointer, but an expression that evaluates to a pointer."
Hope this helps!
this is usually passed as a hidden argument of the method (the only difference throughout different calling conventions is how).
If you call:
myClass.Method(1, 2, 3);
Compiler generates the following code:
Method(&myClass, 1, 2, 3);
Where the first parameter is actually the pointer to this.
Let's check the following code:
class MyClass
{
private:
int a;
public:
void __stdcall Method(int i)
{
a = i;
}
};
int main(int argc, char *argv[])
{
MyClass myClass;
myClass.Method(5);
return 0;
}
By using __stdcall I forced the compiler to pass all parameters through the stack. If you then start the debugger and inspect the assembly code, you'll find something like the following:
myClass.Method(5);
00AA31BE push 5
00AA31C0 lea eax,[myClass]
00AA31C3 push eax
00AA31C4 call MyClass::Method (0AA1447h)
As you see, the parameter of the method is passed through the stack, then address of myClass is loaded to eax register and again pushed on the stack. In other words, this is treated as a regular parameter of this method.
this is an rvalue (you cannot take its address), so it doesn't
(necessarily) occupy memory at all. Depending on the compiler
and the target architecture, it will often be in a register: i0
on a Sparc, ECX with MSVC on Intel, etc. When the optimizer is
active, it can even move around. (I've seen it in different
registers with MSVC).
this behaves mostly like a function argument, and as such will be stored on the stack or - if the binary calling conventions of the architecture allow that - in a register.
this isn't stored at a well-defined location! The object that it points to is stored somewhere, and has a well-defined address, but the address itself does not have a specific home address. It is communicated around in the program. Not only that, but there can be many copies of that pointer.
In the following imaginary init function, the object registers itself to receive events and timer callbacks (using imaginary event source objects). So after the registration, there are two additional copies of this:
void foo_listener::init()
{
g_usb_events.register(this); // register to receive USB events
g_timer.register(this, 5); // register for a 5 second timer
}
I a function activation chain, there will also be multiple copies of the this pointer. Suppose we have an object obj and call its foo function. That function calls the same object's bar function, and bar calls another function called update. Each function activation level has the this pointer. It's stored in a machine register, or in a memory location in the stack frame of the function activation.
Related
I know that the value of a given this cannot be determined at compile time. I am left wondering that once a given object is allocated and constructed, is the value of this cached, or is there literally a runtime evaluation of the expression every use? Here is the specific example that motivates my question. Be warned, this violates about every OOP doctrine and protection feature C++ aims to uphold.
int main()
{
string s1 = string("I am super a super long string named s1, and won't be SSO");
string s2 = string("I am super a super long string named s2, and won't be SSO");
byte* s1interface = reinterpret_cast<byte*>(&s1);
byte* s2interface = reinterpret_cast<byte*>(&s2);
static_assert(sizeof s1 == sizeof s2);
for(int offset(0); offset < sizeof s1; ++offset)
{
*(s1interface + offset) ^= *(s2interface + offset);
*(s2interface + offset) ^= *(s1interface + offset);
*(s1interface + offset) ^= *(s2interface + offset);
}
cout << s1 << '\n' << s2 << "\n\n\n";
return 0;
}
//outputs:
//I am super a super long string **named s2**, and won't be SSO
//I am super a super long string **named s1**, and won't be SSO
//(The emphasis on the output strings was added by me to highlight the identity change)
I want to start off by saying this program not only compiles, but produces that output consistently. My question is not predicated on why/how this works.
The way I see it, any internal variables, even those managing heap memory, will be transplanted as soon as the objects are (re)-fully-formed. However, I envision a hypothetical scenario in which this is queried by an object and then stored internally. After the object transplant operation, &me would not coincide with a this that was queried and stored upon original construction, which would seriously corrupt any operations that use this coupled with any runtime address reflection. Though this should never be done, and all bets are off if anyone dares do anything so heinous with any object theirs or otherwise, does the Standard dictate a persistent evaluation of this, or just for this to do what it says under the assumption the object will only occupy the space where it is put?
EDIT: Let me explain it another way, If during runtime, the object has a hidden and internal this that gets written once it is allocated, and all subsequent reads of this read the stored value, after the transplant &object and this will not be the same. That is clearly not how it is implemented by my compiler, but I want to know if that is by conformance or by luck.
The value of this is never "queried" by an object. It is passed to (non-static) object methods as an implicit parameter.
Supposed you have this c++ code:
#include <stdio.h>
class mystring
{
public:
char *data;
void print();
};
void mystring::print()
{
fputs(this->data, stdout);
}
void
main()
{
mystring s = {"Hello World"};
s.print();
}
Now it looks like the method print does not take any parameters, but actually it does, a pointer to the object s. So the compiler will generate code equivalent to this c program.
#include <stdio.h>
struct mystring
{
char *data;
};
void mystring_print(struct mystring *this)
{
fputs(this->data, stdout);
}
void
main()
{
mystring s = {"Hello World"};
mystring_print(&s);
}
So there is nothing magical with the this pointer. It is just a boring parameter as any other. Things get a little more interesting with virtual methods, but the handling of this stays the same
So I reached out to Stephan Lavavej (Stephan's website), who maintains the Standard Library implementation for Microsoft, with this question. I will post his answer below. I do want to point out that user HAL9000 was essentially right with his answer, but since Stephan's was so thorough, I'll post it, and eventually designate it as the official answer(it's hard to get more official than the words of someone who actually maintains a Big 3 implementation of the Standard). If you find this answer informative, HAL9000's answer has a visual example to reinforce the idea.
Stephan's Words:
You shouldn't think of the "this" pointer as being stored within an object. The implicit parameter mental model is the most accurate one.
When a function x() calls a member function Meow::y() on a Meow object m, x() has to know the address of m already. It might be a local variable (x() knows where all of its local variables live on the stack), it might be a dereferenced pointer (if m is *ptr, ptr points to m), it might be passed via reference (references are not pointers, but they effectively have the same location information as pointers), it might be an element on an array (if m is arr[idx], then arr + idx points to m), etc. So Meow::y() will be implicitly passed the address of m, which becomes "this" within the member function.
Crucially, if you have a struct that contains plain old data (e.g. a bunch of ints) and you swap the contents of two structs, the objects don't change identity - only their contents. If I take all of your stuff in your house and swap it with the stuff within someone else's house, the locations of the houses are unchanged. (In C++, objects can't migrate from one memory address to another - the most you can do is create a separate object, move all the stuff, tell anyone who cares about the old location to instead point to the new location, and then destroy the empty shell of the original object - that's move semantics, basically.)
Because "this" isn't actually stored anywhere, it has zero cost, which is useful to know. (This is outside virtual member functions, which do make objects pay a vptr cost, but that's much more advanced.)
Hope this helps,
STL
Right now I'm learning the ins and outs of C and C++. I know that when you create an array inside a function, then it is stored inside that function's stack frame. You can return the base address of the array, which is in fact a pointer to the first element in that array. That returned pointer value gets stored into the EAX/RAX register, and then the value from the register is then moved into a pointer variable local to the calling function. The problem is that when the function returns, that function's stack frame gets popped off the called stack, and any data declared inside that function's stack frame expires. The pointer is now pointing to an invalid memory location.
I want to be able to return an array from a called function BY VALUE, not by pointer. The array has to be created inside the function and stored on the stack. I want to return an array by value just as you would return an int that was declared inside the called function.
int f() {
int a = 5;
return a; // returned by value
}
int main() {
int b = f();
return 0;
}
Here the int value is moved into the EAX/RAX register, so it is a copy. The called function's stack frame is cleared off the call stack, but there is no problem since the returned value is now stored in the register just before copying it into int b.
I know that in C++ I can create a vector inside the called function and then return it by value. But I don't want to use such higher level abstractions in favor of learning a "hacky" way to do it. I'll come back to vectors in a bit.
Well, I realized that it is possible to return a struct object by value from a function. So my solution to returning an array by value is very simple: put it inside a struct, and return that struct by value!
struct String {
char array[20];
};
struct String f() {
struct String myString;
strcpy(myString.array, "Hello World");
return myString; // Is this returned by value?
}
int main() {
struct String word = f();
printf("%s\n", word.array);
}
Please clarify me if I understand the code correctly. That struct object gets created inside the called function's stack frame, "Hello World" is copied into the array contained within, and then what?
The struct String word is a lvalue and f() returns an rvlaue. When one struct is assigned to another all of it's data members are copied one by one.
What happens in between, just after the struct is returned from the called function by value, and before it is assigned to the struct inside the main() function? The EAX/RAX register is the destination for returned values. It is either 64 bits or 32 bits depending if you have a 64 or 32-bit computer. How exactly do you fit a struct object into a register? I imagine that the array maybe not only 20 bytes, but let's say 100 bytes! Is the struct copied from the function into the register piece-by-piece? Or is it copied from one memory location on the stack to another by value all in one go? And also what happens to the original struct object which was created inside the called function? Those are all questions that I'd like to know answers to.
Also, about returning vectors from functions by value. Vectors in C++ are classes, and classes are similar to structs. Can you answer the question, what happens when you return a vector by value from a function? And what happens when you pass a class/struct object into a function as a parameter?
I can imagine how pass by value works with small data types. I don't even know how it works for complex data types and data structures.
The precise mechanism is platform-dependent. But the most common mechanism is that the caller allocates space on its stack for the struct to be returned and passes the address of that space as an extra argument, usually before all the real arguments.
On many platforms, structs small enough to fit in a register will be returned as though they were a single value. This would apply on x86-64 for struct consisting of two 32-bit ints, since they could be returned in a single 64-bit register. How large a struct can be handled this way will vary from platform to platform.
The cost of passing larger structs by value can be ameliorated by copy elision. If, for example, you write
struct MyThingy blob = blobMaker();
the compiler is likely to pass blobMaker the address of the variable blob rather than allocating a temporary variable and then copying the temporary to blob after the function returns. The called function may also be able to avoid copies:
struct MyThingy blobMaker(void) {
struct MyThingy retval;
// ...
retval.member1 = some_calc(42);
// ...
retval.member2 = "Hello";
// ...
return retval;
Here, the compiler might chose to not allocate retval in the called function's stack frame, but instead just use the storage passed in the invisible argument directly, thus avoiding a copy at the return. The combination of these two optimisations (when possible) makes returning structs almost free.
The C++ standard provides for these optimisations by explicitly allowing them even in cases where the elided copies might have triggered side effects in the object's copy constructor. (Obviously this case doesn't exist in C.)
Imagine the following scenario:
typedef std::function<float(float)> A;
typedef float(*B)(float);
A foo();
void bar(B b);
You wish to do something along the lines of:
bar(foo());
Obviously this does not work. Mainly because A can contain a state and B is a function pointer. What if we know that A does not contain a state and we wish to somehow take it's "meaning" and put it into something that can be passed for a B?
Is it impossible?
If you can ensure that the callable object stored in A is a function pointer or a lambda with an empty capture list, you can simply get a function pointer in this way:
foo().target<B>();
In general, a std::function can "box" some closure (e.g. the value of some lambda function). And a closure contains both code and data (the closed values). So I believe that you cannot portably convert it to a naked function pointer. BTW, because conceptually closures are mixing code and data languages not providing them (like C) practically requires callbacks (i.e. the convention to pass every function pointer with some additional data, look into GTK for a concrete example).
Some implementation specific tricks might make a trampoline function on the stack (e.g. dynamically generate, perhaps with asmjit, some machine code containing a pointer to the closure, etc.). But this is not portable and system specific (in particular because the stack needs to be executable)
What if we know that A does not contain a state and we wish to somehow take it's "meaning" and put it into something that can be passed for a B?
Even that isn't sufficient. std::function provides a target() member function, that if you know the exact type of the underlying functor, you can get it back. So we can do, for instance:
void print(int i) { std::cout << i; }
std::function<void(int)> f = print;
auto ptr = f.target<void(*)(int)>(); // not null
(*ptr)(42); // works
However, even if our functor f doesn't contain state, that doesn't mean that its underlying type is precisely void(*)(int). It could be a completely different function pointer, in which case we wouldn't be able to pull it out:
int square(int i) { return i*i; }
f = square;
ptr = f.target<void(*)(int)>(); // nullptr!
Or it could be a lambda, in which case we wouldn't even be able to name the type, much less pull it out:
f = [](int i){ std::cout << i; }; // same as print, right?
ptr = f.target<void(*)(int)>(); // ... nope :'(
Basically, type erasure is really type erasure. If you need the original underlying type, that's likely indicative of a bad design.
Suppose we have a class:
class Foo
{
private:
int a;
public:
void func()
{
a = 0;
printf("In Func\n");
}
}
int main()
{
Foo *foo = new Foo();
foo->func();
return 0;
}
When the object of the class Foo is created and initialized, I understand that integer a will take up 4 bytes of memory. How is the function stored? What happens in memory / stack /registers / with the program counter when calling foo->func()?
The short answer: It will be stored in the text or code section of the binary only once irrespective of the number of instances of the class created.
The functions are not stored separately anywhere for each instance of a class. They are treated the same way as any other non-member function would be. The only difference is that the compiler actually adds an extra parameter to the function,which is a pointer of the class type.
For example the compiler will generate the function prototype like this:
void func(Foo* this);
(Note that this may not be the final signature. The final signature can be much more cryptic depending on various factors including the compiler)
Any reference to a member variable will be replaced by
this-><member> //for your ex. a=0 translates to this->a = 0;
So the line foo->func(); roughly translates to:
Push the value of Foo* on to the stack. #Compiler dependent
call func which will cause the instruction pointer to jump to the offset of func in the executable #architecture dependent Read this and this
Func will pop the value from stack. Any further reference to a member variable would be preceded by dereferencing of this value
Your function is not virtual, it is thus statically called : the compiler inserts a jump to the code segment corresponding to your function. No additional memory is used per instance.
Were your function virtual, your instance would carry a vpointer, which would be dereferenced to find its class' vtable, which would then be indexed to find the function pointer to be called, and finally jump there. The additional cost is thus one vtable per class (probably the size of one function pointer, times the number of virtuals functions of your class), and one pointer per instance.
Note that this is a common implementation of virtual calls, but is in no way enforced by the standard, so it could actually not be implemented like that at all but your chances are quite good. The compiler can also often bypass the virtual call system altogether if it has knowledge of the static type of your instance at compile time.
Member functions are just like regular functions, they are stored in the "code" or "text" section. There is one thing special with (non-static) member functions, and that is the "hidden" this argument that is passed along to function. So in your case, the address in foo will be passed to func.
Exactly how that argument is passed, and what happens to registers and stack is defined by the ABI (Application Binary Interface), and varies from processor to processor. There is no strict definition for this, unless you tell us what the combination of compiler, OS and processor is being used (and assuming that information is then publicly available - not all compiler/OS vendors will tell this very clearly). As an example, x86-64 will use RCX for this on WIndows, and RDI on Linux, and the call instruction will automatically push the return address onto the stack. On an ARM processor [in Linux, but I think the same applies in Windows, I just have never looked at that], R0 is used for the this pointer, and the BX instruction used for the call, which as part of itself stores lr with the pc of the instruction to return to. lr then has to be saved [probably on the stack] in func, since it calls printf.
Today I stumbled over a piece of code that looked horrifying to me. The pieces was chattered in different files, I have tried write the gist of it in a simple test case below. The code base is routinely scanned with FlexeLint on a daily basis, but this construct has been laying in the code since 2004.
The thing is that a function implemented with a parameter passing using references is called as a function with a parameter passing using pointers...due to a function cast. The construct has worked since 2004 on Irix and now when porting it actually do work on Linux/gcc too.
My question now. Is this a construct one can trust? I can understand if compiler constructors implement the reference passing as it was a pointer, but is it reliable? Are there hidden risks?
Should I change the fref(..) to use pointers and risk braking anything in the process?
What do you think?
Edit
In the actual code both fptr(..) and fref(..) use the same struct - changed code below to reflect this better.
#include <iostream>
#include <string.h>
using namespace std;
// ----------------------------------------
// This will be passed as a reference in fref(..)
struct string_struct {
char str[256];
};
// ----------------------------------------
// Using pointer here!
void fptr(string_struct *str)
{
cout << "fptr: " << str->str << endl;
}
// ----------------------------------------
// Using reference here!
void fref(string_struct &str)
{
cout << "fref: " << str.str << endl;
}
// ----------------------------------------
// Cast to f(const char*) and call with pointer
void ftest(void (*fin)())
{
string_struct str;
void (*fcall)(void*) = (void(*)(void*))fin;
strcpy(str.str, "Hello!");
fcall(&str);
}
// ----------------------------------------
// Let's go for a test
int main() {
ftest((void (*)())fptr); // test with fptr that's using pointer
ftest((void (*)())fref); // test with fref that's using reference
return 0;
}
What to you think?
Clean it up. That's undefined behavior and thus a bomb which might blow up anytime. A new platform or compiler version (or moon phase, for that matter) could trip it.
Of course, I don't know what the real code looks like, but from your simplified version it seems that the easiest way would be to give string_struct an implicit constructor taking a const char*, templatize ftest() on the function pointer argument, and remove all the casts involved.
It's obviously a horrible technique, and formally it's undefined behaviour and a serious error to call a function through an incompatible type, but it should "work" in practice on a normal system.
At the machine level, a reference and a pointer have exactly the same representation; they are both just the address of something. I would fully expect that fptr and fref compile to exactly the same thing, instruction for instruction, on any computer you could get your hands on. A reference in this context can simply be thought of as syntactic sugar; a pointer that is auto-dereferenced for you. At the machine level they are exactly the same. Obviously there might be some obscure and/or defunct platforms where that might not be the case, but generally speaking that's true 99% of the time.
Furthermore, on most common platforms, all object pointers have the same representation, as do all function pointers. What you've done really isn't all that different from calling a function expecting an int through a type taking a long, on a platform where those types have the same width. It's formally illegal, and all but guaranteed to work.
It can even be inferred from the definition of malloc that all object pointers have the same representation; I can malloc a huge chunk of memory, and stick any (C-style) object I like there. Since malloc only returned one value, but that memory can be reused for any object type I like, it's hard to see how different object pointers could reasonably use different representations, unless the compiler was maintaining an big set of value-representation mappings for every possible type.
void *p = malloc(100000);
foo *f = (foo*)p; *f = some_foo;
bar *b = (bar*)p; *b = some_bar;
baz *z = (baz*)p; *z = some_baz;
quux *q = (quux*)p; *q = some_quux;
(The ugly casts are necessary in C++). The above is required to work. So while I don't think it is formally required that afterwards memcmp(f, b) == memcmp(z, q) == memcmp(f, q) == 0, but it's hard to imagine a sane implementation that could make those false.
That being said, don't do this!
It works by pure chance.
fptr expects a const char * while fref expects a string_struct &.
The struct string_struct have the same memory layout as the const char * since it only contains a 256 bytes char array, and does not have any virtual members.
In c++, call by reference e.g. string_struct & is implemented by passing a hidden pointer to the reference so on the call stack it will be the same as if it was passed as a true pointer.
But if the structure string_struct changes, everything will break so the code is not considered safe at all. Also it is dependent on compiler implementation.
Let's just agree that this is very ugly and you're going to change that code.
With the cast you promise that you make sure the types match and they clearly don't.
At least get rid of the C-style cast.