c++ puzzle about return value by reference

c++ puzzle about return value by reference - c++

set<string> foo {
set<string> a;
// operation on a
return a;
}
Is there any performance difference if I do:
set<string>& foo {
set<string> a;
// ops on a
return a;
}
If so, my understanding is that a will be allocated on stack. After foo() returns,
the memory space will be reclaimed. How can we reference to a memory which is claimed?

In case B, any actual use of the returned value results in undefined behavior. You are not allowed to return a local automatic variable by reference and expect anyone to be able to access it.
See this live work space warning message. Your compiler will usually warn you when you do something like that, but relying on it is not always advised.
Note that in C++11, the first example is highly efficient. Betweeen lvalue reference based move and NRVO, the cost to return a local std::vector<std::string> by value is on the order of a few ifs and copying a 32 bits of memory, or in some cases less (ie, actually zero cost, because the std::set gets constructed in the callers scope directly).

There is no point discussing performance of a code which is not correct, like your second snippet, which behaviour is undefined (this will normally make your program crash). You cannot return references to automatic objects (allocated on stack), local to the function.
It would be ok if, for example, your foo function was a member function of a class, and was returning a reference to a member of the class:
class C {
public:
set<string>& foo1 { return a_; }
set<string> foo2 { return a_; }
private:
set<string> a_;
}
In the above example foo1 wil be more efficient than foo2 cause it will not create any new object, just return a reference to the existing one.

Related

I can't understand this copy constructor behaviour

I'm having some weird behaviour with the following:
using namespace std;
struct Number
{
Number(int init) : a(init) {}
Number() {};
int a;
static Number& getNumber() { return Number(555); }
//Number(const Number& other)
//{
// a = other.a;
//} // I've commented this out you'll see why
};
int main()
{
Number num1; // Is now junk
Number num2; // Is now junk
num2 = Number::getNumber(); // num 2 is still junk
Number num3 = Number::getNumber(); // num3 has been assigned 555.
// If I define my own copy constructor
// (uncomment it), it stays junk.
cout << num3.a;
}
When I make my own copy constructor, whether it takes a const or not, the value coming in in the "other" argument is junk. I don't get this behaviour if the copy constructor is the default one. I tried it on IDEOne using GCC and this code doesn't compile. However on my Visual Studio it runs as I described.
I find it really hard to understand the rules of how long a temporary is still valid. For example, I thought that if getNumber() returns a reference to a local temporary, it's OK if it's assigned directly on the same line. I was wrong.

getNumber has undefined behavior. You are returning a reference to a local object. When the function returns that object is destroyed so now you have a reference to a object that no longer exists. To fix this we can just return by value like
static Number getNumber() { return {555}; }
And now you the number that is returned is directly constructed from the return value.
All function local variables are destroyed after the return value is created but before execution proceeds. That means returning any type of reference or pointer to a local object will leave you with a dangling reference/pointer.

I learned something trying to answer this.
What are the possible ways you could return an object from your static function?
By value
This is usually the correct way. Often the compiler will use a Return-Value-Optimisation or otherwise avoid actually copying the object multiple times. If you're not sure this is probably what you want.
If the default copy constructor isn't sufficient for you, make sure you define your own, remembering to define it to take a const ref argument. If you're using C++11, defining a separate move constructor may be useful.
By non-const reference
This is incorrect because it's a reference (effectively a pointer) to a memory location that used to contain a variable which doesn't exist any more.
It's an error in gcc. It used to be allowed in Visual Studio although I hear it may not be any more. You can compile with compiler option /Za to turn off various microsoft specific extensions if you want to.
By const reference
This is not an error, but is a warning and is undefined behaviour [citation needed]
You can bind a const ref to a temporary object. See Herb Sutter's article: https://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/
e.g. "const Number& num = get_number_by_value()" will often return a copy of a temporary object, and elide the copy, and then the temporary object will be bound to the reference and have its lifetime extended in way that works specifically for const ref (but not other ref, or pointers).
However, I just learned now looking it up, that this technically applies to returning a temporary from a function, but that lifetime is not further lengthened if that is then assigned to another const ref.
So your case
Number num = get_number_by_const_ref()
may work ok but
const Number& num = get_number_by_const_ref()
may not.
Return a const reference to a static member variable
This is not usually helpful, but if your class is very expensive to construct (requires a lot of calculation or uses GB of memory) and you want to return a particular instance of it multiple times, you might have a private const static member variable of the class which stores an instance you can return by ref.
Remember, if you have a static member variable containing an instance variable of the class, it needs to be initialised outside the class in a .c file so the constructor function is available.

The function static Number& getNumber() { return Number(555); } creates a temporary Number and returns a reference to it. The temporary object ceases to exist at the end of the function, meaning the reference you are returning now refers to a destroyed object. This is undefined behavior, which means the behavior could be anything, including appearing to work sometimes. The compiler is not required to diagnose this error, but some (such as GCC) do. If you intend to return a mutable reference to a shared instance, declare a static local object in the body of the function and return a reference to it.
static Number& getNumber()
{
static Number my_instance{555};
return my_instance;
}

why use a const non-reference when const reference lifetime is the length of the current scope

So in c++ if you assign the return value of a function to a const reference then the lifetime of that return value will be the scope of that reference. E.g.
MyClass GetMyClass()
{
return MyClass("some constructor");
}
void OtherFunction()
{
const MyClass& myClass = GetMyClass(); // lifetime of return value is until the end
// of scope due to magic const reference
doStuff(myClass);
doMoreStuff(myClass);
}//myClass is destructed
So it seems that wherever you would normally assign the return value from a function to a const object you could instead assign to a const reference. Is there ever a case in a function where you would want to not use a reference in the assignment and instead use a object? Why would you ever want to write the line:
const MyClass myClass = GetMyClass();
Edit: my question has confused a couple people so I have added a definition of the GetMyClass function
Edit 2: please don't try and answer the question if you haven't read this:
http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/

If the function returns an object (rather than a reference), making a copy in the calling function is necessary [although optimisation steps may be taken that means that the object is written directly into the resulting storage where the copy would end up, according to the "as-if" principle].
In the sample code const MyClass myClass = GetMyClass(); this "copy" object is named myclass, rather than a temporary object that exists, but isn't named (or visible unless you look at the machine-code). In other words, whether you declare a variable for it, or not, there will be a MyClass object inside the function calling GetMyClass - it's just a matter of whether you make it visible or not.
Edit2:
The const reference solution will appear similar (not identical, and this really just written to explain what I mean, you can't actually do this):
MyClass __noname__ = GetMyClass();
const MyClass &myclass = __noname__;
It's just that the compiler generates the __noname__ variable behind the scenes, without actually telling you about it.
By making a const MyClass myclass the object is made visible and it's clear what is going on (and that the GetMyClass is returning a COPY of an object, not a reference to some already existing object).
On the other hand, if GetMyClass does indeed return a reference, then it is certainly the correct thing to do.
IN some compilers, using a reference may even add an extra memory read when the object is being used, since the reference "is a pointer" [yes, I know, the standard doesn't say that, but please before complaining, do me a favour and show me a compiler that DOESN'T implement references as pointers with extra sugar to make them taste sweeter], so to use a reference, the compiler should read the reference value (the pointer to the object) and then read the value inside the object from that pointer. In the case of the non-reference, the object itself is "known" to the compiler as a direct object, not a reference, saving that extra read. Sure, most compilers will optimise such an extra reference away MOST of the time, but it can't always do that.

One reason would be that the reference may confuse other readers of your code. Not everybody is aware of the fact that the lifetime of the object is extended to the scope of the reference.

The semantics of:
MyClass const& var = GetMyClass();
and
MyClass const var = GetMyClass();
are very different. Generally speaking, you would only use the
first when the function itself returns a reference (and is
required to return a reference by its very semantics). And you
know that you need to pay attention to the lifetime of the
object (which is not under your control). You use the second
when you want to own (a copy of) the object. Using the second
in this case is misleading, can lead to surprises (if the
function also returns a reference to an object which is
destructed earlier) and is probably slightly less efficient
(although in practice, I would expect both to generate exactly
the same code if GetMYClass returns by value).

Performance
As most current compilers elide copies (and moves), both version should have about the same efficiency:
const MyClass& rMyClass = GetMyClass();
const MyClass oMyClass = GetMyClass();
In the second case, either a copy or move is required semantically, but it can be elided per [class.copy]/31. A slight difference is that the first one works for non-copyable non-movable types.
It has been pointed out by Mats Petersson and James Kanze that accessing the reference might be slower for some compilers.
Lifetime
References should be valid during their entire scope just like objects with automatic storage are. This "should" of course is meant to be enforced by the programmer. So for the reader IMO there's no differences in the lifetimes implied by them. Although, if there was a bug, I'd probably look for dangling references (not trusting the original code / the lifetime claim for the reference).
In the case GetMyClass could ever be changed (reasonably) to return a reference, you'd have to make sure the lifetime of that object is sufficient, e.g.
SomeClass* p = /* ... */;
void some_function(const MyClass& a)
{
/* much code with many side-effects */
delete p;
a.do_something(); // oops!
}
const MyClass& r = p->get_reference();
some_function(r);
Ownership
A variable directly naming an object like const MyClass oMyClass; clearly states I own this object. Consider mutable members: if you change them later, it's not immediately clear to the reader that's ok (for all changes) if it has been declared as a reference.
Additionally, for a reference, it's not obvious that the object its referring to does not change. A const reference only implies that you won't change the object, not that nobody will change the object(*). A programmer would have to know that this reference is the only way of referring to that object, by looking up the definition of that variable.
(*) Disclaimer: try to avoid unapparent side effects

I don't understand what you want to achieve. The reason that T const& can be bound (on the stack) to a T (by value) which is returned from a function is to make it possible other function can take this temporary as an T const& argument. This prevents you from requirement to create overloads. But the returned value has to be constructed anyway.
But today (with C++11) you can use const auto myClass = GetMyClass();.
Edit:
As an excample of what can happen I will present something:
MyClass version_a();
MyClass const& version_b();
const MyClass var1 =version_a();
const MyClass var2 =version_b();
const MyClass var3&=version_a();
const MyClass var4&=version_b();
const auto var5 =version_a();
const auto var6 =version_b();
var1 is initialised with the result of version_a()
var2 is initialised with a copy of the object to which the reference returned by version_b() belongs
var3 holds a const reference to to the temoprary which is returned and extends its lifetime
var4 is initialised with the reference returned from version_b()
var5 same as var1
var6 same as var4
They are semanticall all different. var3 works for the reason I gave above. Only var5 and var6 store automatically what is returned.

there is a major implication regarding the destructor actually being called. Check Gotw88, Q3 and A3. I put everything in a small test program (Visual-C++, so forgive the stdafx.h)
// Gotw88.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <iostream>
class A
{
protected:
bool m_destroyed;
public:
A() : m_destroyed(false) {}
~A()
{
if (!m_destroyed)
{
std::cout<<"A destroyed"<<std::endl;
m_destroyed=true;
}
}
};
class B : public A
{
public:
~B()
{
if (!m_destroyed)
{
std::cout<<"B destroyed"<<std::endl;
m_destroyed=true;
}
}
};
B CreateB()
{
return B();
}
int _tmain(int argc, _TCHAR* argv[])
{
std::cout<<"Reference"<<std::endl;
{
const A& tmpRef = CreateB();
}
std::cout<<"Value"<<std::endl;
{
A tmpVal = CreateB();
}
return 0;
}
The output of this little program is the following:
Reference
B destroyed
Value
B destroyed
A destroyed
Here a small explanation for the setup. B is derived from A, but both have no virtual destructor (I know this is a WTF, but here it's important). CreateB() returns B by value. Main now calls CreateB and first stores the result of this call in a const reference of type A. Then CreateB is called and the result is stored in a value of type A.
The result is interesting. First - if you store by reference, the correct destructor is called (B), if you store by value, the wrong one is called. Second - if you store in a reference, the destructor is called only once, this means there is only one object. By value results in 2 calls (to different destructors), which means there are 2 objects.
My advice - use the const reference. At least on Visual C++ it results in less copying. If you are unsure about your compiler, use and adapt this test program to check the compiler. How to adapt? Add copy / move constructor and copy-assignment operator.
I quickly added copy & assignment operators for class A & B
A(const A& rhs)
{
std::cout<<"A copy constructed"<<std::endl;
}
A& operator=(const A& rhs)
{
std::cout<<"A copy assigned"<<std::endl;
}
(same for B, just replace every capital A with B)
this results in the following output:
Reference
A constructed
B constructed
B destroyed
Value
A constructed
B constructed
A copy constructed
B destroyed
A destroyed
This confirms the results from above (please note, the A constructed results from B being constructed as B is derived from A and thus As constructor is called whenever Bs constructor is called).
Additional tests: Visual C++ accepts also the non-const reference with the same result (in this example) as the const reference. Additionally, if you use auto as type, the correct destructor is called (of course) and the return value optimization kicks in and in the end it's the same result as the const reference (but of course, auto has type B and not A).

const_cast a const member in a class constructor

I sometimes use const_cast when I want a member variable of a class to be constant during the life of the class, but it needs to be mutable during the constructor. Example:
struct qqq {
const vector<foo> my_foo;
qqq(vector<foo>* other) {
vector<foo>& mutable_foo = const_cast<vector<foo>&>(my_foo)
other->swap(mutable_foo);
}
};
I had assumed that doing this in the constructor was basically OK because nobody else is relying on it at this point so it wouldn't interact badly with optimization, etc.
However recently someone told me this is "undefined behavior" and that it's basically illegal to mutate a const object after it's been constructed under any circumstance.
Can someone clarify? Is this a bad / undefined behavior / thing to do?

It is Undefined Behavior. Per Paragraph 7.1.6.1/4 of the C++11 Standard:
Except that any class member declared mutable (7.1.1) can be modified, any attempt to modify a const
object during its lifetime (3.8) results in undefined behavior.
In this case, it seems like you want your object to "become" constant after construction. This is not possible.
If your vector is meant to be const, you shall initialize it in the constructor's initialization list:
qqq(vector<foo>& other)
: my_foo(std::move(other))
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^
{
}
Notice, that unless you have a good reason for passing by pointer - in which case, you should also be checking whether the pointer is non-null - you should consider passing by reference (as shown above), which is the common practice.
UPDATE:
As Pete Becker correctly points out in the comments, proper design would suggest that the decision to move from the vector argument should belong to the caller of qqq's constructor, and not to the constructor itself.
If the constructor is always supposed to move from its argument, then you could let it accept an rvalue reference, making it clear what the constructor itself is expecting out of the caller:
qqq(vector<foo>&& other)
// ^^
: my_foo(std::move(other))
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^
{
}
This way, the caller would have to provide an rvalue in input to qqq's constructor:
std::vector<foo> v;
// ...
qqq q1(v); // ERROR!
qqq q2(std::move(v)); // OK! Now the client is aware that v must be moved from

Yes, it's indeed UB (Undefined Behaviour). You cannot modify a const object once it's initialised. What you should do is use the member initialiser list, perhaps together with a function:
struct qqq {
const vector<foo> my_foo;
qqq(vector<foo> *other) : my_foo(initialiseFoo(*other)) {}
static vector<foo> initialiseFoo(vector<foo> &other) {
vector<foo> tmp;
other.swap(tmp);
return tmp;
}
};
A decent optimiser should be able to get rid of the temporary.
If you can use C++11, it's actually even simpler:
struct qqq {
const vector<foo> my_foo;
qqq(vector<foo> *other) : my_foo(std::move(*other))
{
other->clear(); //Just in case the implementation of moving vectors is really weird
}
};

Binding rvalue-reference to a local variable in VC++

I'm using VC++2012 to run the following code:
#include <utility>
struct A
{
int* m_p;
A() { m_p = new int; }
~A() { delete m_p; }
A(const A& otherA)
{
m_p = new int;
// BOOM!
*m_p = *otherA.m_p;
}
};
A&& CreateA()
{
A a;
return std::move(a);
}
int _tmain(int argc, _TCHAR* argv[])
{
A a2 = CreateA();
return 0;
}
During the creation of a2 A's copy ctor is called - and crashes, since the source object created in CreateA() is already destroyed.
Is this standard behaviour? Could this be a compiler bug??
Notice that if you change a2's type from 'A' to 'const A&' the crash doesn't occur - which reinforces the suspicion that it is indeed a bug.
Can anyone shed some light on this?
Note: I'm fully aware this is not the intended usage for rvalue-refs, and this example is contrived. Just hoping to get a better grasp on the behaviour of this new type.

Look at what happens in your code:
CreateA() is called
inside the function, a local variable of type A is created.
you create a rvalue reference pointing to this local variable
you return this rvalue reference
as you return, the object of type A, which the rvalue reference points to, goes out of scope, and gets destroyed
the reference now points to a destroyed object
you try to initialize a2 as a copy of the object that once existed inside the function call
And... that doesn't work. The object you're trying to copy is dead and gone. Undefined behavior.
Don't do that. :)
In C++, references do not affect the lifetime of the referenced object. There is no "I'm pointing at this object, so you can't destroy it!".
Never return references to local objects. It doesn't work, so... just don't do it.

You cannot access a local variable outside its scope. Rvalue references don't change that: they are still references. The code presented has undefined behaviour because it returns a reference to a local variable and then accesses it.
Don't return rvalue references. That is silly the vast majority of time. Return values instead:
A CreateA()
{
A a;
return a; // a move here is automatic
// unless you are using a compiler with outdated rules like MSVC
//return std::move(a); // ok, poor MSVC
// alternatively:
//return A{}; //or
//return A();
}
When you write A const& a2 = CreateA(); nothing crashes, because you don't actually access any object. All you do is grab a dangling reference. However, this code is not even well-formed, it just happens to compile because MSVC has some outdate rules for reference binding.
So, basically, these behaviours are a mix of compiler bugs and undefined behaviour :)

C++: Will structs be copied implicitly

I'm relatively new to C++ and I'm wondering if structs are copied in the following case:
struct foo {
int i;
std::vector<int> bar;
}
class Foobar {
foo m_foo;
void store(foo& f) {
this->m_foo = f;
}
}
void main() {
Foobar foobar;
{
foo f;
f.i = 1;
f.bar.insert(2);
foobar.store(f);
}
// will a copy of f still exist in foobar.m_foo, or am I storing a NULL-Pointer at this point?
}
The reason why I am asking this is that I am originally a .NET developer and in .NET structures will be copied if you pass them to a function (and classes are not).
I'm pretty sure it would be copied if store was not declared to take f by reference, but I cannot change this code.
Edit: Updated the code, because I didn't know that the vector.insert would affect my question. In my case I store the struct as a member in a class, not a vector.
So my question really was: will f be copied at this->m_foo = f;?

Short answer: Yes.
Long answer: You'd have to get a pointer to a stack allocated struct and then let that struct go out of scope in order to end up with a dangling reference in your vector... but even then, you wouldn't have stored a NULL. C and C++ pointers are simple things, and will continue to point at a memory location long after that memory location has become invalid, if your code doesn't overwrite them.
It might also be worth noting that std::vector has a decent set of copy and move functions associated with it that will be called implicitly in this case, so the bar vector inside the struct will also be copied along with the simple integer i. Standard library classes tend to be quite well written, but code by other folk has no such guarantee!
Now, as regards your edit:
class Foobar {
foo m_foo;
void store(foo& f) {
this->m_foo = f;
}
}
You will still not have any problems with the foo instance stored in m_foo. This is because this->m_foo = f invokes a copying operation, as m_foo is not a variable of a reference or pointer type. If you had this instead: foo& m_foo then you would run into difficulties because instead of copying a foo instance you are instead copying a reference to a foo instance, and when that instance goes out of scope, the reference is no longer valid.

Yes, the struct will be copied, in the following function:
foos.insert(f);
As a copy is made, you won't be storing a null pointer / null reference.
However, like you've said, it won't be copied when you call store(f); as the function accepts the argument as a reference.
Your edit will still make a copy of Foo. You are assigning one instance of a variable to another instance of a variable. What you aren't doing is assigning one pointer (reference in C#) to another. You could probably do with doing some reading around C++ object instances, pointers, and references.

A copy of f is made during foos.insert(f)
void store(foo& f) {
foos.insert(f);
}
void main() {
{
foo f;
f.i = 1;
f.bar.insert(2);
store(f);
}
// at this place, local variable `f` runs out of scope, it's destroyed and cleaned up
// foos is holding the copy of `f`
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

c++ puzzle about return value by reference - c++

Related

I can't understand this copy constructor behaviour

why use a const non-reference when const reference lifetime is the length of the current scope

const_cast a const member in a class constructor

Binding rvalue-reference to a local variable in VC++

C++: Will structs be copied implicitly

Categories

Resources