Look at this code:
class Test
{
//
};
Test TestAddress()
{
Test test;
cout << "Object Address in function: " << (int)&test << endl;
return test;
}
int IntAddress()
{
int test;
cout << "Integer Address in function: " <<(int)&test << endl;
return test;
}
int main() {
int x = IntAddress();
cout << "Integer Address in Main: " <<(int)&x << endl;
Test object = TestAddress();
cout << "Object Address in Main: " <<(int)&object << endl;
return 0;
}
The output is:
Integer Address in function: 1076679252
Integer Address in Main: 1076679220
Object Address in function: 1076679221
Object Address in function: 1076679221
Could someone explain me why is that, when I'm returning an object by value I receive the same addresses in function and main. But when I do the same thing with an Integer, the adressess are different?
I think that your compiler is applying return value optimization, where main() and TestAddress() are acting on the same object variable in memory.
The so called "automatic variables" are allocated on the stack.
The moment of their allocation can be subject to compiler optimization.
When a local variable is returned by value, the compiler has essentially two choices:
Allocate the variable within the function, then, at the return
Create a temporary and copy the variable in it and
Destroy the variable
then at the = in the caller, copy the temporary into the caller receiving constructing variable.
Or, it can:
see that the function local variable is the only expression used in the return statement (there is just one return, there, so that's trivial)
see that the return value is actually used to construct the caller variable, hence...
allocate only one variable in the caller space, just before the call and the function local one be a reference to it, and eliminate all the copy stuff.
Now, since the cost to "copy an integer" and of "dereferece an integer" in term of CPU processing are in favor of the copy (an int is the CPU primitive number type), and since "temporary integer" can fit a CPU register (so it's not a "copy to memory) the compiler use the first method for int's.
Since a class can have whatever size (and copy may have higher cost), the compiler can decide do adopt the second method.
In any case, you should not care about that: the external observable behavior will be the same, since the access to the external and internal variables are mutually exclusive.
First, you don't need to cast your pointers to int, as operator<<() has an overload that accepts a void* for printing pointer addresses. Any pointer can be passed to << without casting (unless there is a specialized overload present for that pointer type, like the one for char*, in which case you should cast to void* instead of int).
Second, most compilers implement Return Value Optimization when a function returns an object by value (RVO is not used for primitive types, as copies are cheap). So your code is essentially acting more like this behind the scenes:
void TestAddress(Test &result)
{
cout << "Object Address in function: " << &result << endl;
}
int main()
{
//...
Test object;
TestAddress(object);
cout << "Object Address in Main: " << &object << endl;
return 0;
}
Or even this:
void TestAddress(Test &result)
{
new (&result) Test(); // <-- constructor called here instead!
cout << "Object Address in function: " << &result << endl;
}
int main()
{
//...
Test object; // <-- constructor not called here!
TestAddress(object);
cout << "Object Address in Main: " << &object << endl;
return 0;
}
Either way, the compiler is smart enough to know that the temp object inside the function is going to end up in the caller's object, so it eliminates the temp and acts on the caller's object directly instead.
That is why you see the same memory address being reported by two output statements seemingly acting on two separate objects - they are actually the same object in memory when RVO is being used.
Wyjun,
When you return an object by value, both of those objects contain the same pointer. However, when you return a primitive by value, the primitive within the function scope is destroyed, it's value already being contained in the callee's stack frame, and that value will be reallocated to a new memory space.
Please let me know if you have any questions!
Thank you for your time,
Related
If I return a vector from a function, the object to which it is assigned will have the same address (without returning it as a reference) as the one declared in the function. For example:
vector<int> f() {
vector<int> foo(5);
cout << &foo << endl;
return foo;
}
int main() {
vector<int> bar = f();
cout << &bar << endl; // == &foo
return 0;
}
Then I assumed that this is happening because of the copy constructor and the & operator may be overloaded in a way such that it prints the address of a specific member of the vector class which was copied from foo to bar.
But, if I change the scenario and move f() inside a class, the behaviour would be as I initially expected: &foo != &bar:
class A {
vector<int> foo;
public:
vector<int> f() {
foo.push_back(10);
cout << &foo << endl;
return foo;
}
};
int main() {
A a;
vector<int> bar = a.f();
cout << &bar << endl; // != &foo
return 0;
}
Can you explain what happens?
What does the address of a vector represent?
Computer memory can be thought of an array of bytes. Memory address is an index to that figurative array. This is the same for all objects, including vectors.
If I return a vector from a function, the object to which it is assigned will have the same address (without returning it as a reference) as the one declared in the function.
This is not guaranteed by the standard. But it is indeed possible. It will happen whenever Named Return Value Optimisation is used to elide the copy/move of the return value.
Then I assumed that this is happening because of the ... & operator may be overloaded in a way such that ...
The addressof operator of vector is not overloaded.
Can you explain what happens?
You return a copy of a member rather than a local automatic variable. NRVO cannot be used in this case.
Disregarding subobjects, more than one object cannot overlap the same memory at any given time. If two objects exist simultaneously, then they each must have a distinct address. On the other hand, once the memory of one object is released, it can be reused by another object.
In the first example, the lifetime of the local variable ends, so there is no problem for its memory to overlap with the memory of the variable that is initialised from the return value.
In the second example, the lifetime of a and thus also its member overlaps with the lifetime of bar, and therefore they cannot overlap in memory.
I am currently reading the second edition of C++: A Beginner's Guide by Herbert Schildt.
In Module 9.4, he talks about returning objects:
Just as objects can be passed to functions, functions can return objects. To return an object, first declare
the function as returning a class type. Second, return an object of that type using the normal return
statement. The following program has a member function called mkBigger( ). It returns an object that
gives val a value twice as large as the invoking object.
This is the 'following program' he mentions:
// Returning objects.
#include <iostream>
using namespace std;
class MyClass {
int val;
public:
// Normal Constructor.
MyClass(int i) {
val = i;
cout << "Inside constructor\n";
}
~MyClass() {
cout << "Destructing\n";
}
int getval() { return val; }
// Return an object.
MyClass mkBigger() {
Myclass o(val * 2); // mkBigger() returns a MyClass object.
return o;
}
};
void display(MyClass ob)
{
cout << ob.getval() << '\n';
}
int main()
{
cout << " Before Constructing a.\n";
MyClass a;
cout << "After constructing a.\n\n";
cout << "Before call to display.\n";
display(a);
cout << "After display() returns.\n\n";
cout << "Before call to mkBigger().\n";
a = a.mkBigger();
cout << "After mkBigger() returns.\n\n";
cout << "Before second call to display.\n";
display(a);
cout << "After display() returns.\n\n";
return 0;
}
This gives us the following output:
Before Constructing a.
Inside constructor
After constructing a.
Before call to display.
10
Destructing
After display() returns.
Before call to mkBigger()
Inside constructor
Destructing
Destructing
After mkBigger() returns.
Before second call to display.
20
Destructing
After display() returns.
Destructing
Schildt then goes on to explain that the reason there are two 'Destructing' messages during the mkBigger() call is because of the fact that:
when an object is returned by a function, a temporary object is automatically created, which holds the return value. It is this object that is actually returned by the function. After the value has been returned, this object is destroyed.
I was actually surprised there wasn't 3 'Destructing' messages. I have the following issue: Given the definition of mkBigger(), a new MyClass instance is created, and it is that instance that is returned and placed in the address of a. Thus, when doing
a = a.mkBigger();
My impression is thus that the original object previously held in a is no longer referenced by a. Is this correct? If so, I then have the following issues:
I was told C++ has some minute notions of garbage collection. Would that object thus be garbage-collected? where is this object now? Is this an example of the possible feared memory leaks that many mention when talking about the 'dangers' of C++?
One of the destructor in mkbigger() is called on o, the MyClass instance passed in by value; it goes out of scope at the end of the function. The other is called on the temporary copy of o returned when it is destroyed. What else goes out of scope? Not a in main(); therefore you should not expect a third destructor to be called. C++ does not provide garbage collection outside of calling destructors when automatic objects go out of scope.
Unlike some other modern languages, a does not "hold a reference" to an object; a is the object, in that it is a certain number of bytes holding the raw data members. When you do a = a.mkBigger();, MyClass's default assignment operator is called, which simply copies the val inside the temporary object on the right hand side into the val inside a, overwriting the value that was already there. a = a.makeBigger() would be equivalent to a.val = a.makeBigger().val if val were public.
Memory leaks occur when you use new to allocate memory and then fail to use delete to deallocate that memory. For classes that do this internally, you must write at least your own copy constructor, assignment operator, and destructor.
Considering the following minimal code:
class MyClass {
public:
MyClass() {}
};
MyClass myfunc() {
MyClass obj;
cout << "Address of obj in myFunc " << &obj << endl;
return obj;
}
int main() {
MyClass obj(myfunc());
cout << "Address of obj in main " << &obj << endl;
return 0;
}
I obtain the following output:
Address of obj in myFunc 0x7fff345037df
Address of obj in main 0x7fff3450380f
Now, just by adding a destructor in MyClass, I get the following output:
Address of obj in myFunc 0x7fffb6aed7ef
Address of obj in main 0x7fffb6aed7ef
Showing that both objects are now the same... Is this just a coincidence ?!
Also, what does exactly happen in:
MyClass obj(myfunc());
I have overloaded the copy constructor to print a message, but it never appears...
By adding a destructor (whatever it was that you actually did, you're not showing the code) the behavior changed to use Return Value Optimization, known as RVO.
Then a pointer to the caller's storage is passed to the function, and the function constructs the object directly in that storage, instead of e.g. copying a value in a processor register or set of registers.
The same calling convention, with a hidden result storage pointer, can also be used without RVO. Without RVO a copy or move is performed at the end of the function. The standard supports RVO optimization under certain conditions, but, while it can be reasonably expected, a compiler is not under any obligation to perform RVO.
I've got a few doubts about references in C++.
Test & returnref(){
Test obj(9,9);
cout << "in function: " << &obj << endl;
return obj;
} // *
int main(){
Test & asdf = returnref();
Test asdf2 = returnref();
cout << "in main asdf: " << &asdf;
cout << "in main asdf2: " << &asdf2;
cin.get();
return 0;
}
the result:
in function: 0033F854
in function: 0033F854
in main asdf: 0033F854
in main asdf2: 0033F938
is it correct?
in my opinion the obj is being removed on 5th line (*) - because it's alive in this function scope.
so why it's working? Is it just Visual Studio? or maybe I'm wrong?
You are allocating an object on the function's stack and when the function returns,
The object used is destroyed. allocate it dynamically using new(followed by a delete of course),and than
do whatever you need.
Your options are to return by value or to return a reference/pointer to a heap-based object.
Return by value
Changing your function signature to this
Test returnval()
will copy obj. Note that the pointers you print out may still have the same value for the object inside the class and the object outside, as the compiler may have performed a return value optimisation.
If the Test class is not managing dynamically allocated resources, then you can rely on the automatically created copy constructors that the compiler will inject. If Test has dynamically allocated data, then you must write your own. See What is the Rule of Three?.
Return a pointer (or reference) to a heap-based object
You can change it to a heap-based object by using new, and then return a pointer instead:
Test* returnptr(){
Test* obj = new Test(9,9);
cout << "in function: " << obj << endl;
return obj;
}
Or better yet, a smart pointer like shared_ptr to manage the deletion for you:
shared_ptr<Test> returnptr() {
// Wrapping the pointer in a shared_ptr will ensure it gets cleaned
// up automatically when the last reference to it (usage of it)
// goes out of scope.
shared_ptr<Test> obj(new Test(9,9));
cout << "in function: " << obj.get() << endl;
return obj;
}
Final note
As pointed out by one of the commenters on my answer, in C++11, you have further options to control how the temporary object from your function is returned by providing move constructors for Test and using std::move as necessary. This is a fairly meaty subject, but you can read more about it at the following links:
What is move semantics?
Rvalue References: C++0x Features in VC10
A Brief Introduction to Rvalue References
Using a reference to a local variable returned by a function has undefined behavior. But note that ub doesn't mean "it will crash", means "I don't know what will happen".
In your case, your calls don't reuse the memory used by that stackframe, so your local variable is still there.
is it possible/ok to return a const reference even if the value the function returns is a local variable of this function? i know that locals are not valid anymore once the function returns - but what if the function is inlined and the returned value is only used within the callers scope? then the locals of the function should be included in the callers stackframe, no?
Don't count on it. Even if this works on 1 compiler, it's not standard supported behavior and is likely to break on others.
No, it's not OK. Local variables are declared on the stack, and the stack keeps changing between method calls. Also, the objects that get out of scope get destroyed. Always return a copy of a local variable.
Consider this code:
#include <iostream>
using namespace std;
class MyClass
{
public:
MyClass() { cout << "ctor" << endl; }
~MyClass() { cout << "dtor" << endl; }
MyClass(const MyClass& r) { cout << "copy" << endl; }
};
const MyClass& Test()
{
MyClass m;
return m;
}
int main()
{
cout << "before Test" << endl;
MyClass m = Test();
cout << "after Test" << endl;
}
This will print out:
before Test
ctor
dtor
copy
after Test
dtor
The object you're trying to copy has already called its destructor and may be in an invalid state.
inline is not a guarantee -- it's a suggestion. Even if you use tricks to force inline, you'll never be sure about the result, especially if you want to remain portable.
Hence, don't do it.
Doing that invokes undefined behaviour.
There's no way of forcing a compiler to inline the function. inline is just a suggestion - so is __forceinline
Even if you could guarantee that the function would be inlined, the destructor for the variable in question will still be executed, leaving you with a reference to a dead object.
And the big one - C++'s concept of the stack is delimited by scope - not by function.
#include <iostream>
int main()
{
{
int a = 5;
std::cout << std::hex << "0x" << &a << std::endl;
}
{
int b = 10;
std::cout << std::hex << "0x" << &b << std::endl;
}
}
My compiler puts 'a' and 'b' at different memory address. Except when I turn optimizations on. Yours may well decide that it's an optimization to reuse the memory your object previously occupied.
Is there a paticular problem you're trying to solve here? There are other ways of reducing the number of temporary objects created if that's your concern.
As others have noted, this is dangerous. It's also unnecessary, if your compiler supports the NRVO (Named Return Value Optimization), and your function uses and returns the local variable you would have liked to return by ref in a fairly simple way.
The NRVO allows the compiler to avoid copy construction under certain conditions - typically the main reason to avoid returning objects by value. VC++ 8 supports this (a delta on previous revisions) and it makes quite a bit of perf diff in frequently used code.
The value falls out of scope when the callee falls out of scope. So no, it is gone.
But if you want a fairly ugly solution (and a red flag warning you that your design might need refactoring), you can do something like this:
const MyObj& GetObj()
{
static const MyObj obj_;
return obj_;
}
...but this solution if fraught with peril, especially if the object is modifyable, or does something non-trivial in a multithreaded environment.
The inline keyword doesn't guarantee that the function is really inlined. Don't do it.