If I return a vector from a function, the object to which it is assigned will have the same address (without returning it as a reference) as the one declared in the function. For example:
vector<int> f() {
vector<int> foo(5);
cout << &foo << endl;
return foo;
}
int main() {
vector<int> bar = f();
cout << &bar << endl; // == &foo
return 0;
}
Then I assumed that this is happening because of the copy constructor and the & operator may be overloaded in a way such that it prints the address of a specific member of the vector class which was copied from foo to bar.
But, if I change the scenario and move f() inside a class, the behaviour would be as I initially expected: &foo != &bar:
class A {
vector<int> foo;
public:
vector<int> f() {
foo.push_back(10);
cout << &foo << endl;
return foo;
}
};
int main() {
A a;
vector<int> bar = a.f();
cout << &bar << endl; // != &foo
return 0;
}
Can you explain what happens?
What does the address of a vector represent?
Computer memory can be thought of an array of bytes. Memory address is an index to that figurative array. This is the same for all objects, including vectors.
If I return a vector from a function, the object to which it is assigned will have the same address (without returning it as a reference) as the one declared in the function.
This is not guaranteed by the standard. But it is indeed possible. It will happen whenever Named Return Value Optimisation is used to elide the copy/move of the return value.
Then I assumed that this is happening because of the ... & operator may be overloaded in a way such that ...
The addressof operator of vector is not overloaded.
Can you explain what happens?
You return a copy of a member rather than a local automatic variable. NRVO cannot be used in this case.
Disregarding subobjects, more than one object cannot overlap the same memory at any given time. If two objects exist simultaneously, then they each must have a distinct address. On the other hand, once the memory of one object is released, it can be reused by another object.
In the first example, the lifetime of the local variable ends, so there is no problem for its memory to overlap with the memory of the variable that is initialised from the return value.
In the second example, the lifetime of a and thus also its member overlaps with the lifetime of bar, and therefore they cannot overlap in memory.
Related
I am currently reading the second edition of C++: A Beginner's Guide by Herbert Schildt.
In Module 9.4, he talks about returning objects:
Just as objects can be passed to functions, functions can return objects. To return an object, first declare
the function as returning a class type. Second, return an object of that type using the normal return
statement. The following program has a member function called mkBigger( ). It returns an object that
gives val a value twice as large as the invoking object.
This is the 'following program' he mentions:
// Returning objects.
#include <iostream>
using namespace std;
class MyClass {
int val;
public:
// Normal Constructor.
MyClass(int i) {
val = i;
cout << "Inside constructor\n";
}
~MyClass() {
cout << "Destructing\n";
}
int getval() { return val; }
// Return an object.
MyClass mkBigger() {
Myclass o(val * 2); // mkBigger() returns a MyClass object.
return o;
}
};
void display(MyClass ob)
{
cout << ob.getval() << '\n';
}
int main()
{
cout << " Before Constructing a.\n";
MyClass a;
cout << "After constructing a.\n\n";
cout << "Before call to display.\n";
display(a);
cout << "After display() returns.\n\n";
cout << "Before call to mkBigger().\n";
a = a.mkBigger();
cout << "After mkBigger() returns.\n\n";
cout << "Before second call to display.\n";
display(a);
cout << "After display() returns.\n\n";
return 0;
}
This gives us the following output:
Before Constructing a.
Inside constructor
After constructing a.
Before call to display.
10
Destructing
After display() returns.
Before call to mkBigger()
Inside constructor
Destructing
Destructing
After mkBigger() returns.
Before second call to display.
20
Destructing
After display() returns.
Destructing
Schildt then goes on to explain that the reason there are two 'Destructing' messages during the mkBigger() call is because of the fact that:
when an object is returned by a function, a temporary object is automatically created, which holds the return value. It is this object that is actually returned by the function. After the value has been returned, this object is destroyed.
I was actually surprised there wasn't 3 'Destructing' messages. I have the following issue: Given the definition of mkBigger(), a new MyClass instance is created, and it is that instance that is returned and placed in the address of a. Thus, when doing
a = a.mkBigger();
My impression is thus that the original object previously held in a is no longer referenced by a. Is this correct? If so, I then have the following issues:
I was told C++ has some minute notions of garbage collection. Would that object thus be garbage-collected? where is this object now? Is this an example of the possible feared memory leaks that many mention when talking about the 'dangers' of C++?
One of the destructor in mkbigger() is called on o, the MyClass instance passed in by value; it goes out of scope at the end of the function. The other is called on the temporary copy of o returned when it is destroyed. What else goes out of scope? Not a in main(); therefore you should not expect a third destructor to be called. C++ does not provide garbage collection outside of calling destructors when automatic objects go out of scope.
Unlike some other modern languages, a does not "hold a reference" to an object; a is the object, in that it is a certain number of bytes holding the raw data members. When you do a = a.mkBigger();, MyClass's default assignment operator is called, which simply copies the val inside the temporary object on the right hand side into the val inside a, overwriting the value that was already there. a = a.makeBigger() would be equivalent to a.val = a.makeBigger().val if val were public.
Memory leaks occur when you use new to allocate memory and then fail to use delete to deallocate that memory. For classes that do this internally, you must write at least your own copy constructor, assignment operator, and destructor.
Look at this code:
class Test
{
//
};
Test TestAddress()
{
Test test;
cout << "Object Address in function: " << (int)&test << endl;
return test;
}
int IntAddress()
{
int test;
cout << "Integer Address in function: " <<(int)&test << endl;
return test;
}
int main() {
int x = IntAddress();
cout << "Integer Address in Main: " <<(int)&x << endl;
Test object = TestAddress();
cout << "Object Address in Main: " <<(int)&object << endl;
return 0;
}
The output is:
Integer Address in function: 1076679252
Integer Address in Main: 1076679220
Object Address in function: 1076679221
Object Address in function: 1076679221
Could someone explain me why is that, when I'm returning an object by value I receive the same addresses in function and main. But when I do the same thing with an Integer, the adressess are different?
I think that your compiler is applying return value optimization, where main() and TestAddress() are acting on the same object variable in memory.
The so called "automatic variables" are allocated on the stack.
The moment of their allocation can be subject to compiler optimization.
When a local variable is returned by value, the compiler has essentially two choices:
Allocate the variable within the function, then, at the return
Create a temporary and copy the variable in it and
Destroy the variable
then at the = in the caller, copy the temporary into the caller receiving constructing variable.
Or, it can:
see that the function local variable is the only expression used in the return statement (there is just one return, there, so that's trivial)
see that the return value is actually used to construct the caller variable, hence...
allocate only one variable in the caller space, just before the call and the function local one be a reference to it, and eliminate all the copy stuff.
Now, since the cost to "copy an integer" and of "dereferece an integer" in term of CPU processing are in favor of the copy (an int is the CPU primitive number type), and since "temporary integer" can fit a CPU register (so it's not a "copy to memory) the compiler use the first method for int's.
Since a class can have whatever size (and copy may have higher cost), the compiler can decide do adopt the second method.
In any case, you should not care about that: the external observable behavior will be the same, since the access to the external and internal variables are mutually exclusive.
First, you don't need to cast your pointers to int, as operator<<() has an overload that accepts a void* for printing pointer addresses. Any pointer can be passed to << without casting (unless there is a specialized overload present for that pointer type, like the one for char*, in which case you should cast to void* instead of int).
Second, most compilers implement Return Value Optimization when a function returns an object by value (RVO is not used for primitive types, as copies are cheap). So your code is essentially acting more like this behind the scenes:
void TestAddress(Test &result)
{
cout << "Object Address in function: " << &result << endl;
}
int main()
{
//...
Test object;
TestAddress(object);
cout << "Object Address in Main: " << &object << endl;
return 0;
}
Or even this:
void TestAddress(Test &result)
{
new (&result) Test(); // <-- constructor called here instead!
cout << "Object Address in function: " << &result << endl;
}
int main()
{
//...
Test object; // <-- constructor not called here!
TestAddress(object);
cout << "Object Address in Main: " << &object << endl;
return 0;
}
Either way, the compiler is smart enough to know that the temp object inside the function is going to end up in the caller's object, so it eliminates the temp and acts on the caller's object directly instead.
That is why you see the same memory address being reported by two output statements seemingly acting on two separate objects - they are actually the same object in memory when RVO is being used.
Wyjun,
When you return an object by value, both of those objects contain the same pointer. However, when you return a primitive by value, the primitive within the function scope is destroyed, it's value already being contained in the callee's stack frame, and that value will be reallocated to a new memory space.
Please let me know if you have any questions!
Thank you for your time,
When a function returns a value, this is put on the stack (the function stack frames are deleted, but the return value remains there until the caller gets it).
If the return value is on the stack how can the move get that value without copying it in the variable memory location?
For example in this code:
A a = getA();
In many implementations of C++, a function returning a "complex" data type is passed a hidden parameter that is a pointer to the space where the returned instance is to reside. Essentially, the compiler turns Foo r = fun(); into
char alignas(Foo) r[sizeof Foo]; // Foo-sized buffer, unitialized!
fun(&r);
As you can see, Foo is allocated on the stack in the caller's frame. Now, within the implementation of fun there could be a copy. The construction
Foo fun() {
Foo rv;
...
return rv;
}
is generally implemented as
void fun(Foo * $ret) {
Foo rv;
..
new ($ret) Foo(rv); // copy construction
}
When the return value optimizations are applied, this gets changed to
void fun(Foo * $ret) {
Foo & rv = *(new ($ret) Foo);
...
return;
}
Now there's no copying involved. That's a mile-high overview of how an implementation might do it.
The storage (Heap, stack, registers, etc) used to store the temporary used to return values from a function is implementation defined. You could see it as:
+-----------------------------+-------------------------+-----------------------------------------+
| target (caller stack frame) | temporary (unspecified) | return statement (function stack frame) |
+-----------------------------+-------------------------+-----------------------------------------+
The value is passed from the right to the left. Also note that the standard specifies that any compiler could elide the temporary and the assigments/copies/moves and directly initialize the target.
Writting a class such as:
class trace
{
public:
trace()
{
std::cout << "Init" << std::endl;
}
~trace()
{
std::cout << "Destroy" << std::endl;
}
trace( const trace& )
{
std::cout << "Copy init" << std::endl;
}
trace( trace&& )
{
std::cout << "Move init" << std::endl;
}
trace& operator=( const trace& )
{
std::cout << "Copy assign" << std::endl;
}
trace& operator=( trace&& )
{
std::cout << "Move assign" << std::endl;
}
};
And trying it with different compiler optimizations enabled is very ilustrative.
You're assuming that A is an aggregate or primitive. If this is true then yes move and copy semantics are equivalent. If however A is a complex type like vector then it will contain pointers to resources. When moving the object the pointers are copied without copying the value they point to.
When you say "move", I assuming you're referring to C++11 move constructors and the std::move function.
Moving an object doesn't actually move the entire object. It constructs a new one using its move constructor, which is allowed to take ownership of resources held by the original object instead of copying them. For example, if you write:
std::vector<int> foo = function_that_returns_a_vector();
the compiler may implement this by calling foo's move constructor and passing it the temporary vector returned by the function. The move constructor will take ownership of the temporary vector's internal pointer to its heap-allocated contents, leaving the temporary vector empty. Prior to C++11 and move support, foo's copy constructor would've been called, which would've allocated new space on the heap to copy the returned vector's contents even though that returned vector is about to be destroyed and won't need its own copy any longer.
Note that the compiler won't necessarily implement that line by constructing foo from a returned temporary at all, though. Depending on the compiler's platform-specific calling convention, the address of the (uninitialized) foo variable may be passed into the function in such a way that the function's return value is constructed directly into foo, avoiding the need for a copy after the function returns. This is called copy elision.
The easiest is to modify your method like this:
void getA(A& out);
A a;
getA(a);
However, your compiler does it's best to avoid such superfluity: Copy Elision
Considering the following minimal code:
class MyClass {
public:
MyClass() {}
};
MyClass myfunc() {
MyClass obj;
cout << "Address of obj in myFunc " << &obj << endl;
return obj;
}
int main() {
MyClass obj(myfunc());
cout << "Address of obj in main " << &obj << endl;
return 0;
}
I obtain the following output:
Address of obj in myFunc 0x7fff345037df
Address of obj in main 0x7fff3450380f
Now, just by adding a destructor in MyClass, I get the following output:
Address of obj in myFunc 0x7fffb6aed7ef
Address of obj in main 0x7fffb6aed7ef
Showing that both objects are now the same... Is this just a coincidence ?!
Also, what does exactly happen in:
MyClass obj(myfunc());
I have overloaded the copy constructor to print a message, but it never appears...
By adding a destructor (whatever it was that you actually did, you're not showing the code) the behavior changed to use Return Value Optimization, known as RVO.
Then a pointer to the caller's storage is passed to the function, and the function constructs the object directly in that storage, instead of e.g. copying a value in a processor register or set of registers.
The same calling convention, with a hidden result storage pointer, can also be used without RVO. Without RVO a copy or move is performed at the end of the function. The standard supports RVO optimization under certain conditions, but, while it can be reasonably expected, a compiler is not under any obligation to perform RVO.
I have asked this question. My question now is how this works? To elaborate, how can I point to an object that is not yet initialised. I have made this MWE and it shows that the object is copy created not copy assigned .i.e. the object is not yet initialised yet I am able to point to it.
#include <iostream>
class Foo {
public:
int x;
Foo(const Foo& ori_foo) {
std::cout << "constructor" << std::endl;
x = ori_foo.x;
}
Foo& operator = (const Foo& ori_foo) {
std::cout << "operator =" << std::endl;
x = ori_foo.x;
return *this;
}
Foo(int new_x) {
x = new_x;
}
};
class BarParent {
public:
Foo *p_foo;
BarParent(Foo* new_p_foo) : p_foo(new_p_foo)
{
std::cout << (*new_p_foo).x << std::endl;
}
};
class BarChild : public BarParent {
public:
Foo foo;
BarChild(Foo new_foo)
:BarParent(&foo) //pointer to member not yet initialised
,foo(new_foo) // order of initilization POINT OF INTEREST
{}
};
int main() {
Foo foo(101);
BarChild bar(foo);
std::cout << bar.p_foo->x << std::endl;
std::cout << bar.foo.x << std::endl;
}
Output:
constructor
0
constructor
101
101
Do not be afraid of getting into details of how the memory is handled. And, where every member resides.
Don't mistake Initialization for Allocation. BarChild::foo will be allocated before the constructor is called since it is stored in place, so there will be a well defined location for BarParent::p_foo to point at. Foo's constructor will Initialize BarChild::foo, but as long as you don't try to read from BarChild::foo before the constructor is called you will not notice the ordering.
At this line
BarChild bar(foo);
the compiler reserves enough stack space for a BarChild object then calls the constructor to begin the object's lifetime. Within the object the foo member has a fixed offset, every BarChild object has a foo member at the same offset within it, so since the this pointer has a known address within the constructor (it's the address of bar on the stack) then this->foo is also at a known address, even if the memory at that address hasn't been initialized yet. The BarChild obejct doesn't "grow bigger" as each member is initialized, its size is fixed and space for all the members is already "reserved" before they're initialized.
This is somewhat analogous to:
char raw_memory[sizeof(Foo)]; // define block of uninitialized memory
char* addr = raw_memory; // take address of uninitialized memory
new (raw_memory) Foo; // initialize memory
The object doesn't exist yet, and it's not valid to use it, but its address is known in advance of it being initialized.