class foo {
public:
foo() : foo_member1(1)
int foo_member1;
void do_something() {foo_member1++;}
}
int main(){
foo my_foo;
my_foo.do_something();
}
Where is everything stored in the above example code? It was something I thought of whilst driving home and was embarrassed that I couldn't answer for certain.
I create an object containing a variable both on the stack. When I call do_something and enter a new stack frame how does it have access to the member variable which was stored in the previous stack frame as part of the my_foo object? Is it copied in? Passed through silently underneath by reference? Or does it actually end up on the heap? Or is there some special mechanism by which code can access something that precedes it in the stack?
This:
my_foo.do_something();
Calls the method foo::do_something() by passing a hidden parameter this = &my_foo. This is just a regular C function that simply gets a hidden parameter. The code of do_something does not really access the stack frame of main, it simply uses this that main passed.
Inside of do_something, the code is treated as:
this->foo_member1++;
No cross-frame access. It is just a pointer to a memory location, that happens to be in main's frame. There is no memory protection when passing pointers between functions.
Related
For the following C code how can I get the address (pointer) of a from foo() function to the main() function?
For some reasons, I cannot use return in foo()
From main() function, I do not know the data type of a
void foo(void *ptr){
int a = 12345;
ptr = &a;
printf("ptr in abc: %d\n",ptr);
}
int main() {
void *ptr;
foo(ptr);
printf("ptr in main: %d\n",ptr);
//printf("a in main: %d\n",*ptr); //print the value of a (ie. 12345)
return 0;
}
How to get [anything] from a function without using return
A way to get things from inside a function to the outside without returning is to use indirection. Pass a pointer to some object as an argument, and indirect through the pointer inside the function to set the value of the pointed object.
From main() function, I do not know the data type of a
You can point to any object using a void pointer without having to know the type of the object.
To put these things together:
int main(void) {
void* ptr; // a variable to store the address
foo(&ptr); // pass pointer to the variable
// ptr now points to where a used to be
}
void foo(void** ptr){
int a = 12345;
*ptr = &a; // set the pointed variable
}
Most importantly however: The local object a no longer exists after foo has returned, and therefore the pointer is dangling, and there is not much useful that can be done with it. As such, this is a rather pointless exercise.
There are 2 main problems with your function foo.
The first one, which is why the program does not compile, is the return type of foo. Because it is void you cannot return any values from it.
The other problem which will lead to undefined behavior is that your variable a is running out of scope. If you want to access it after it runs out of scope it has to be allocated on the heap (e.g. with new).
For some reasons, I cannot use return in foo()
Because you declared foo as having return type void. If you chance that, you can use it:
int* foo() {
int a = 42;
return &a;
}
However, the calling code can’t use that return value since it points to memory that is no longer valid (a local variable in a past function call). This is true regardless of how the calling code gets the pointer: whether it is by returning it, or by passing it to an out parameter. You simply mustn’t do this.
From main() function, I do not know the data type of a
Right, because you explicitly declared the pointer as void* and thus erased the data type. Declare the correct data type to avoid this.
Long story short, there’s no reason to use a void* parameter instead of an int return value here:
int foo() {
int a = 42;
return a;
}
int main(void) {
int a = foo();
printf("a in main: %d\n", x);
}
In order to understand WHY you shouldn't try to return a pointer to a local variable, you need to visualize how local variables are allocated in the first place.
Local variables are allocated in the STACK. The stack is a reserved memory area having as main purpose, leaving a "breadcrumb" trail of memory addresses where the CPU should jump once it finishes executing a subroutine.
Before a subroutine is entered (usually via a CALL machine language Instruction in x86 architectures), the CPU will push on the stack the address of the Instruction immediately following the CALL.
ret_address_N
. . . . . . .
ret_address_3
ret_address_2
ret_address_1
When the subroutine ends, a RETurn Instruction makes the CPU pop the most recent address from the stack and redirects execution by jumping to it, effectively resuming execution on the subroutine or function that initiated the call.
This stack arrangement is very powerful, as it allows you to nest a high number of independent subroutine calls (allowing generic, reusable libraries to be built), it also allows recursive calls, where a function can call itself (either directly, or indirectly, by a nested subroutine).
Additionally, nothing prevents you from pushing custom data on the stack (there are special CPU instructions for this) AS LONG AS THE STACK STATE IS RESTORED BEFORE RETURNING FROM A SUBROUTINE, otherwise when the RET Instructions pops the expected return address, it will fetch garbage and it will try to jump execution to it, most likely crashing. (Incidentally, this is also how many malware exploits work, by overwriting the stack with a valid address, and forcing the CPU to jump to malicious code when it performs a RET instruction)
This stack feature may be used, for example, to store the original state of the CPU registers that are modified inside a subroutine - allowing the code to restore their values before the subroutine exits so that the caller subroutine can see the registers in the same state as they were BEFORE performing the subroutine CALL.
Languages like C also use this feature to allocate local variables by setting up a Stack Frame. The compiler basically adds up how many bytes are required to account for every local variable in a certain subroutine, and will emit CPU instructions that will displace the top of the stack by this computed byte amount when a subroutine is called. Now every local variable can be accessed as a relative offset to the current stack's state.
-------------
------------- local variables for subroutine N
-------------
ret_address_N
------------- local variables for subroutine 3
ret_address_3
------------- local variables for subroutine 2
-------------
ret_address_2
-------------
------------- local variables for subroutine 1
-------------
-------------
ret_address_1
Besides emitting instructions to set up the stack frame (effectively allocating local variables on the stack and preserving current register values), the C compiler Will emit instructions that will restore the stack state to its original state before the function call, so the RET Instruction can find at the top of the stack the correct memory address when it pops the value it should jump to.
Now you can understand why you can not should not return a pointer to a local variable. By doing so, you are returning an address to a value that was stored temprorarily in the stack. You can dereference the pointer and MIGHT see what looks like valid data as you immediately return from the subroutine returning the pointer to the local variable, but this data will certainly be overwritten, probably in the very near future, as program execution continues calling subroutines.
look at my code:
#include <iostream>
using namespace std;
class MyClass{
public:
char ch[50] = "abcd1234";
};
MyClass myFunction(){
MyClass myClass;
return myClass;
}
int main()
{
cout<<myFunction().ch;
return 0;
}
i can't understand where my return value is stored? is it stored in stack? in heap? and does it remain in memory until my program finished?
if it be stored in stack can i be sure that my class values never change?
please explain the mechanism of these return. and if returning structure is different to returning class?
MyClass myClass; is stored on the stack. It's destroyed immediately after myFunction() exits.
When you return it, a copy is made on the stack. This copy exists until the end of the enclosing expression: cout << myFunction().ch;
Note that if your compiler is smart enough, the second object shouldn't be created at all. Rather, the first object will live until the end of the enclosing expression. This is called NRVO, named return value optimization.
Also note that the standard doesn't define "stack". But any common implementation will use a stack in this case.
if returning structure is different to returning class?
There are no structures in C++; keyword struct creates classes. The only difference between class and struct is the default member access, so the answer is "no".
It's up to the implementation to find a sensible place to store that value. While it's usually on the stack, the language definition does not impose any requirements on where it's actually stored. The returned value is a temporary object, and it gets destroyed at the end of the full statement where it is created; that is, it gets destroyed at the ; at the end of the line that calls myFunction().
When you create an object in any function it's destroyed as soon as the function execution is finished just like in variables.
But when you return a object from a function firstly compiler creates a local instance of this object in heap called unnamed_temporary then destroyes the object you created. And copies the contents of unnamed_temporary on call. Then it destroyes this unnamed _temporary also.
Anything you create without the keyword new will be created in stack.
Yes,contets of your variable ch will not change unless you access that variable and change it yourself.
The instance returned by myFunction is temporary, it disappears when it stop to be useful, so it doesn't exist after after the cout <<.... Just add a destructor and you will see when it is called.
What do you mean about can i be sure that my class values never change? ? You get a copy of the instance.
returning structure is different to returning class? : a struct is like a class where all is public by default, this is the alone difference.
Your function is returning a copy of an object. It will be stored in the stack in memory.
The returning obj. will exist until the scope of that function. After that, it will be destroyed. Then, your expression cout<<function(); will also have the copy of that obj. which is returned by the function. IT will be completely destroyed after the running of this cout<<function(); expression.
Right now I'm learning the ins and outs of C and C++. I know that when you create an array inside a function, then it is stored inside that function's stack frame. You can return the base address of the array, which is in fact a pointer to the first element in that array. That returned pointer value gets stored into the EAX/RAX register, and then the value from the register is then moved into a pointer variable local to the calling function. The problem is that when the function returns, that function's stack frame gets popped off the called stack, and any data declared inside that function's stack frame expires. The pointer is now pointing to an invalid memory location.
I want to be able to return an array from a called function BY VALUE, not by pointer. The array has to be created inside the function and stored on the stack. I want to return an array by value just as you would return an int that was declared inside the called function.
int f() {
int a = 5;
return a; // returned by value
}
int main() {
int b = f();
return 0;
}
Here the int value is moved into the EAX/RAX register, so it is a copy. The called function's stack frame is cleared off the call stack, but there is no problem since the returned value is now stored in the register just before copying it into int b.
I know that in C++ I can create a vector inside the called function and then return it by value. But I don't want to use such higher level abstractions in favor of learning a "hacky" way to do it. I'll come back to vectors in a bit.
Well, I realized that it is possible to return a struct object by value from a function. So my solution to returning an array by value is very simple: put it inside a struct, and return that struct by value!
struct String {
char array[20];
};
struct String f() {
struct String myString;
strcpy(myString.array, "Hello World");
return myString; // Is this returned by value?
}
int main() {
struct String word = f();
printf("%s\n", word.array);
}
Please clarify me if I understand the code correctly. That struct object gets created inside the called function's stack frame, "Hello World" is copied into the array contained within, and then what?
The struct String word is a lvalue and f() returns an rvlaue. When one struct is assigned to another all of it's data members are copied one by one.
What happens in between, just after the struct is returned from the called function by value, and before it is assigned to the struct inside the main() function? The EAX/RAX register is the destination for returned values. It is either 64 bits or 32 bits depending if you have a 64 or 32-bit computer. How exactly do you fit a struct object into a register? I imagine that the array maybe not only 20 bytes, but let's say 100 bytes! Is the struct copied from the function into the register piece-by-piece? Or is it copied from one memory location on the stack to another by value all in one go? And also what happens to the original struct object which was created inside the called function? Those are all questions that I'd like to know answers to.
Also, about returning vectors from functions by value. Vectors in C++ are classes, and classes are similar to structs. Can you answer the question, what happens when you return a vector by value from a function? And what happens when you pass a class/struct object into a function as a parameter?
I can imagine how pass by value works with small data types. I don't even know how it works for complex data types and data structures.
The precise mechanism is platform-dependent. But the most common mechanism is that the caller allocates space on its stack for the struct to be returned and passes the address of that space as an extra argument, usually before all the real arguments.
On many platforms, structs small enough to fit in a register will be returned as though they were a single value. This would apply on x86-64 for struct consisting of two 32-bit ints, since they could be returned in a single 64-bit register. How large a struct can be handled this way will vary from platform to platform.
The cost of passing larger structs by value can be ameliorated by copy elision. If, for example, you write
struct MyThingy blob = blobMaker();
the compiler is likely to pass blobMaker the address of the variable blob rather than allocating a temporary variable and then copying the temporary to blob after the function returns. The called function may also be able to avoid copies:
struct MyThingy blobMaker(void) {
struct MyThingy retval;
// ...
retval.member1 = some_calc(42);
// ...
retval.member2 = "Hello";
// ...
return retval;
Here, the compiler might chose to not allocate retval in the called function's stack frame, but instead just use the storage passed in the invisible argument directly, thus avoiding a copy at the return. The combination of these two optimisations (when possible) makes returning structs almost free.
The C++ standard provides for these optimisations by explicitly allowing them even in cases where the elided copies might have triggered side effects in the object's copy constructor. (Obviously this case doesn't exist in C.)
Where exactly is the 'this' pointer stored in memory? Is it allocated on the stack, in the heap, or in the data segment?
#include <iostream>
using namespace std;
class ClassA
{
int a, b;
public:
void add()
{
a = 10;
b = 20;
cout << a << b << endl;
}
};
int main()
{
ClassA obj;
obj.add();
return 0;
}
In the above code I am calling the member function add() and the receiver object is passed implicitly as the 'this' pointer. Where is this stored in memory?
The easiest way is to think of this as being a hidden extra argument that is always passed automatically.
So, a fictional method like:
size_t String::length(void) const
{
return strlen(m_string);
}
is actually more like this under the hood:
size_t String__length(const String *this)
{
return strlen(this->m_string);
}
and a call like:
{
String example("hello");
cout << example.length();
}
becomes something like:
cout << String__length(&example);
Note that the above transformation is simplified, hopefully to make my point a bit clearer. No need to fill up the comments with "whaaa, where's the marshalling for method overloading, huh?"-type objection, please. :)
That transforms the question into "where are arguments stored?", and the answer is of course "it depends". :)
It's often on the stack, but it could be in registers too, or any other mechanism that the compiler considers is good for the target architecture.
Other answers have done a very good job explaining how a typical compiler implements this (by passing it as an implicit first parameter to the function).
I think it's also useful to see what the C++ ISO spec explicitly says about this. According to the C++03 ISO spec, §9.3.2/1:
In the body of a nonstatic (9.3) member function, the keyword this is a non-lvalue expression whose value is the address of the object for which the function is called.
It's important to note that this is not a variable - it's an expression, much in the same way that the expression 1 + 2 * 3 is an expression. The value of this expression is permitted to be stored pretty much anywhere. The compiler might put it on the stack and pass it as an implicit parameter to a function, or it might put it in a register, and it conceivably could put it in the heap or in the data segment. The C++ specification deliberately gives the implementation some flexibility here.
I think that the "language-lawyer" answer is "this is completely implementation-defined, and moreover this is technically not a pointer, but an expression that evaluates to a pointer."
Hope this helps!
this is usually passed as a hidden argument of the method (the only difference throughout different calling conventions is how).
If you call:
myClass.Method(1, 2, 3);
Compiler generates the following code:
Method(&myClass, 1, 2, 3);
Where the first parameter is actually the pointer to this.
Let's check the following code:
class MyClass
{
private:
int a;
public:
void __stdcall Method(int i)
{
a = i;
}
};
int main(int argc, char *argv[])
{
MyClass myClass;
myClass.Method(5);
return 0;
}
By using __stdcall I forced the compiler to pass all parameters through the stack. If you then start the debugger and inspect the assembly code, you'll find something like the following:
myClass.Method(5);
00AA31BE push 5
00AA31C0 lea eax,[myClass]
00AA31C3 push eax
00AA31C4 call MyClass::Method (0AA1447h)
As you see, the parameter of the method is passed through the stack, then address of myClass is loaded to eax register and again pushed on the stack. In other words, this is treated as a regular parameter of this method.
this is an rvalue (you cannot take its address), so it doesn't
(necessarily) occupy memory at all. Depending on the compiler
and the target architecture, it will often be in a register: i0
on a Sparc, ECX with MSVC on Intel, etc. When the optimizer is
active, it can even move around. (I've seen it in different
registers with MSVC).
this behaves mostly like a function argument, and as such will be stored on the stack or - if the binary calling conventions of the architecture allow that - in a register.
this isn't stored at a well-defined location! The object that it points to is stored somewhere, and has a well-defined address, but the address itself does not have a specific home address. It is communicated around in the program. Not only that, but there can be many copies of that pointer.
In the following imaginary init function, the object registers itself to receive events and timer callbacks (using imaginary event source objects). So after the registration, there are two additional copies of this:
void foo_listener::init()
{
g_usb_events.register(this); // register to receive USB events
g_timer.register(this, 5); // register for a 5 second timer
}
I a function activation chain, there will also be multiple copies of the this pointer. Suppose we have an object obj and call its foo function. That function calls the same object's bar function, and bar calls another function called update. Each function activation level has the this pointer. It's stored in a machine register, or in a memory location in the stack frame of the function activation.
I have the following code:
void Foo() {
static std::vector<int>(3);
// Vector object is constructed every function call
// The destructor of the static vector is invoked at
// this point (the debugger shows so)
// <-------------------
int a;
}
Then somewhere I call Foo several times in a sequence
Why does the vector object gets constructed on every Foo() call and why is the destructor called right after static ... declaration?
Update:
I was trying to implement function once calling mechanism and I thought that writing something like
static core::CallOnce(parameters) where CallOnce is a class name would be very nice.
To my mind writing static core::CallOnce call_once(parameters) looks worse, but okay, this is the case I can't do anything with it.
Thank you.
Your variable needs a name:
static std::vector<int> my_static_vector(3);
You forgot to give the vector a name, so without any variable pointing to it it's destroyed immediately after it's created
Because std::vector<int>(3) creates an unnamed temporary, which lives only to the end of it's contained expression. The debugger can't show destruction in the same line as construction though, so it shows it on the next line.
Give the item an name and normal static semantics will apply.