I have a TestClass with a const& member variable. I know from various places and own experiences that it is a bad idea to initialize this const& with the reference to a temporary value. So I was quite suprised that the following code will compile fine (tested with gcc-4.9.1, clang-3.5, and scan-build-3.5) but fail to run properly.
class TestClass {
public:
// removing the "reference" would remove the temporary-problem
const std::string &d;
TestClass(const std::string &d)
: d(d) {
// "d" is a const-ref, cannot be changed at all... if it is assigned some
// temporary value it is mangled up...
}
};
int main() {
// NOTE: the variable "d" is a
// temporary, whose reference is not valid... what I don't get in the
// moment: why does no compiler warn me?
TestClass dut("d");
// and printing what we got:
std::cout << "beginning output:\n\n";
// this will silently abort the program (gcc-4.9.1) or be empty
// (clang-3.5) -- don't know whats going on here...
std::cout << "dut.d: '" << dut.d << "'\n";
std::cout << "\nthats it!\n";
return 0;
}
Why does none of the two compilers warn me at compile-time? See also this ideone, with some more testing going on.
No warning as no offense:
local const references prolong the lifespan of the variable.
The standard specifies such behavior in §8.5.3/5, [dcl.init.ref], the section on initializers of reference declarations. The lifetime extension is not transitive through a function argument. §12.2/5 [class.temporary]:
The second context is when a reference is bound to a temporary. The temporary to which the reference is bound or the temporary that is
the complete object to a subobject of which the temporary is bound
persists for the lifetime of the reference except as specified below.
A temporary bound to a reference member in a constructor’s
ctor-initializer (§12.6.2 [class.base.init]) persists until the
constructor exits. A temporary bound to a reference parameter in a
function call (§5.2.2 [expr.call]) persists until the completion of
the full expression containing the call.
You can have a look at gotw-88 for an extended and more readable discussion on this topic.
Undefined Behaviour
So is it your code correct? Nope, and its execution will lead to undefined behaviour. The real problem in your code snapshot is that the Undefined Behaviour is caused by the mix of two perfectly legal operations: the call of the constructor passing a temporary object (whose life spans inside the constructor block) and the binding of the reference in the constructor definition.
The compiler is not smart enough to detect this explosive statement combination, so this is why you don't get any warning.
Binding a const & to a temporary is valid and the compiler will ensure that the temporary will live at least as long as the reference. This allows you to do things like pass string literals into functions expecting a const std::string &.
In your case however you are copying that reference and thus the lifetime guarantee no longer holds. Your constructor exits and the temporary is destroyed and you are left with a reference to invalid memory.
The problem is that there is no single point in which a warning would be warranted. It's only the combination of the call of the constructor and its implementation that leads to Undefined Behaviour.
If you consider just the constructor:
class TestClass {
public:
const std::string &d;
TestClass(const std::string &d)
: d(d)
{}
};
There's nothing wrong here, you got a reference and you're storing one. Here's an example of perfectly valid use:
class Widget {
std::string data;
TestClass test;
public:
Widget() : data("widget"), test(data)
{}
};
If you consider just the call site:
//Declaration visible is:
TestClass(const std::string &d);
int main() {
TestClass dut("d");
}
Here, the compiler doesn't "see" (in the general case) the definition of the constructor. Imagine an alternative:
struct Gadget {
std::string d;
Gadget(cosnt std::string &d) : d(d) {}
};
int main()
{
Gadget g("d");
}
Surely you wouldn't want a warning here either.
To summarise, both the call site and the constructor implementation are perfectly usable as-is. It's only their combination which causes issues, but that combination is beyond the context a compiler can reasonably use to emit warnings.
TestClass(const std::string &d1)
: d(d1) {
TestClass dut("d");
I guess following is happening logically:-
1) Your string literal ("d") would be implicitly converted to std::string ( Let's give it a name 'x' ).
2) So, 'x' is a temporary which is bound to d1 here. Lifetime of this temporary is extended to lifetime of your d1. Although that string literal would always be alive till the end of program.
3) Now you're making 'd' refer to 'd1'.
4) At the end of your constructor d1's lifetime is over and so is d's.
All compiler's are not so clever to figure out these minor glitches...
Related
In the new C++20 standard, cpprefrence says:
a temporary bound to a reference in a reference element of an aggregate initialized using direct-initialization syntax (parentheses) as opposed to list-initialization syntax (braces) exists until the end of the full expression containing the initializer. Example:
struct A {
int&& r;
};
A a1{7}; // OK, lifetime is extended
A a2(7); // well-formed, but dangling reference
NOTE: I am using GCC as this is the only compiler which supports this feature.
Having read this, and knowing that references are polymorphic, I decided to create a simple class:
template <typename Base>
struct Polymorphic {
Base&& obj;
};
So that the following code works:
struct Base {
inline virtual void print() const {
std::cout << "Base !\n";
}
};
struct Derived: public Base {
inline virtual void print() const {
std::cout << "Derived !" << x << "\n";
}
Derived() : x(5) {}
int x;
};
int main() {
Polymorphic<Base> base{Base()};
base.obj.print(); // Base !
Polymorphic<Base> base2{Derived()};
base2.obj.print(); // Derived 5!
}
The problem I encountered is when changing the value of my polymorphic object (not the value of obj, but the value of a polymorphic object itself). Since I can't reassign to r-value references, I tried to do the following using placement new:
#include <new>
int main() {
Polymorphic<Base> base{Base()};
new (&base) Polymorphic<Base>{Derived()};
std::launder(&base)-> obj.print(); // Segmentation fault... Why is that?
return 0;
}
I believe I have a segmentation fault because Polymorphic<Base> has a reference sub object. However, I am using std::launder - isn't that supposed to make it work? Or is this a bug in GCC? If std::launder does not work, how do I tell the compiler not to cache the references?
On a side note, please do not tell me "your code is stupid", "use unique pointers instead"... I know how normal polymorphism works; I asked this question to deepen my understanding of placement new and std::launder :)
[class.temporary]/6.12 states:
A temporary bound to a reference in a new-initializer persists until the completion of the full-expression containing the new-initializer.
It is not picky about how the reference is initialized; this applies to all ways such references get initialized. Indeed, there's even an example:
struct S { int mi; const std::pair<int,int>& mp; };
S a { 1, {2,3} };
S* p = new S{ 1, {2,3} }; // creates dangling reference
Placement-new is a new-initializer. So it applies. Just like the above, base contains a dangling reference. It doesn't matter how you access it after that point; the object it references has been destroyed, so you get UB.
If the lifetime of a temporary is not obvious by the reader of some code, you should not be using a temporary. Just give it a name, and all your problems go away.
Look at the bullet point immediately above the one you quoted in the link you provided:
a temporary bound to a reference in the initializer used in a new-expression exists until the end of the full expression containing that new-expression, not as long as the initialized object. If the initialized object outlives the full expression, its reference member becomes a dangling reference.
In your crashing example, you are using a new expression, so the lifetime is not extended.
For example I have a class that call a function in its consturctor that returns local object. I'm trying to use rvalue references to get access to this object to avoid expensive move of it in memory.
class MyClass
{
BigObject&& C;
MyClass() : C(f())
{
};
};
BigObject f()
{
return BigObject();
}
But compiller tells me that reference member is initialized to a temporary that doesn't persist after the construction exits.
I don't get it. I understand that local objects, created in a scope of a function, exists only in a scope of function. Reaching the end of the scope - destructors of local objects are called. And here I initialize rvalue reference with local object , and I have access to it, while I'm in the body of constuctor.
Can someone explain, what is going on here? And is there a way to return a local object and use it as any ligetable class member, without moving it in memory?
You should remove the && from C.
The code is not as expensive as you imagine: f's return value is a copy elision context. So the compiler is allowed to construct just a single BigObject directly in the memory space for C. Even if a compiler doesn't perform this , it is still a move context so as worst case scenario the object will be moved.
If somehow your object is copyable but not movable then you do have to rely on copy elision but it's hard to imagine a valid use case for such an object.
To quote it from the standard,
Section 12.2.5 The second context is when a reference is bound to
a temporary. The temporary to which the reference is bound or the
temporary that is the complete object of a subobject to which the
reference is bound persists for the lifetime of the reference except:
— A temporary bound to a reference member in a constructor’s
ctor-initializer (12.6.2) persists until the constructor exits.
So, in your case 'C(f())', C gets bound to the temporary only till the constructor exits and after that it's kind of UB and is upto your luck. You should be thankful that the compiler reported a warning :).
Anyhow, there is no need to do such acrobatics, but you can always move from the temporary into 'C'.
class BigObject
{
public:
BigObject() {std::cout << "Cons" << std::endl;}
BigObject(const BigObject& other) {std::cout << "Copy Cons" << std::endl;}
BigObject(BigObject&& other) {std::cout << "Move Cons" << std::endl;}
};
BigObject f()
{
return BigObject();
}
class MyClass
{
public:
BigObject C;
MyClass() : C(f())
{
};
};
In this case compiler can very well elide the copy making the code as efficient as possible by just having to construct 'BigObject' once at the target site.
So, in this particular case it could be possible that your object is not moved at all or even copied. Perhaps that answers your final question.
So in c++ if you assign the return value of a function to a const reference then the lifetime of that return value will be the scope of that reference. E.g.
MyClass GetMyClass()
{
return MyClass("some constructor");
}
void OtherFunction()
{
const MyClass& myClass = GetMyClass(); // lifetime of return value is until the end
// of scope due to magic const reference
doStuff(myClass);
doMoreStuff(myClass);
}//myClass is destructed
So it seems that wherever you would normally assign the return value from a function to a const object you could instead assign to a const reference. Is there ever a case in a function where you would want to not use a reference in the assignment and instead use a object? Why would you ever want to write the line:
const MyClass myClass = GetMyClass();
Edit: my question has confused a couple people so I have added a definition of the GetMyClass function
Edit 2: please don't try and answer the question if you haven't read this:
http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/
If the function returns an object (rather than a reference), making a copy in the calling function is necessary [although optimisation steps may be taken that means that the object is written directly into the resulting storage where the copy would end up, according to the "as-if" principle].
In the sample code const MyClass myClass = GetMyClass(); this "copy" object is named myclass, rather than a temporary object that exists, but isn't named (or visible unless you look at the machine-code). In other words, whether you declare a variable for it, or not, there will be a MyClass object inside the function calling GetMyClass - it's just a matter of whether you make it visible or not.
Edit2:
The const reference solution will appear similar (not identical, and this really just written to explain what I mean, you can't actually do this):
MyClass __noname__ = GetMyClass();
const MyClass &myclass = __noname__;
It's just that the compiler generates the __noname__ variable behind the scenes, without actually telling you about it.
By making a const MyClass myclass the object is made visible and it's clear what is going on (and that the GetMyClass is returning a COPY of an object, not a reference to some already existing object).
On the other hand, if GetMyClass does indeed return a reference, then it is certainly the correct thing to do.
IN some compilers, using a reference may even add an extra memory read when the object is being used, since the reference "is a pointer" [yes, I know, the standard doesn't say that, but please before complaining, do me a favour and show me a compiler that DOESN'T implement references as pointers with extra sugar to make them taste sweeter], so to use a reference, the compiler should read the reference value (the pointer to the object) and then read the value inside the object from that pointer. In the case of the non-reference, the object itself is "known" to the compiler as a direct object, not a reference, saving that extra read. Sure, most compilers will optimise such an extra reference away MOST of the time, but it can't always do that.
One reason would be that the reference may confuse other readers of your code. Not everybody is aware of the fact that the lifetime of the object is extended to the scope of the reference.
The semantics of:
MyClass const& var = GetMyClass();
and
MyClass const var = GetMyClass();
are very different. Generally speaking, you would only use the
first when the function itself returns a reference (and is
required to return a reference by its very semantics). And you
know that you need to pay attention to the lifetime of the
object (which is not under your control). You use the second
when you want to own (a copy of) the object. Using the second
in this case is misleading, can lead to surprises (if the
function also returns a reference to an object which is
destructed earlier) and is probably slightly less efficient
(although in practice, I would expect both to generate exactly
the same code if GetMYClass returns by value).
Performance
As most current compilers elide copies (and moves), both version should have about the same efficiency:
const MyClass& rMyClass = GetMyClass();
const MyClass oMyClass = GetMyClass();
In the second case, either a copy or move is required semantically, but it can be elided per [class.copy]/31. A slight difference is that the first one works for non-copyable non-movable types.
It has been pointed out by Mats Petersson and James Kanze that accessing the reference might be slower for some compilers.
Lifetime
References should be valid during their entire scope just like objects with automatic storage are. This "should" of course is meant to be enforced by the programmer. So for the reader IMO there's no differences in the lifetimes implied by them. Although, if there was a bug, I'd probably look for dangling references (not trusting the original code / the lifetime claim for the reference).
In the case GetMyClass could ever be changed (reasonably) to return a reference, you'd have to make sure the lifetime of that object is sufficient, e.g.
SomeClass* p = /* ... */;
void some_function(const MyClass& a)
{
/* much code with many side-effects */
delete p;
a.do_something(); // oops!
}
const MyClass& r = p->get_reference();
some_function(r);
Ownership
A variable directly naming an object like const MyClass oMyClass; clearly states I own this object. Consider mutable members: if you change them later, it's not immediately clear to the reader that's ok (for all changes) if it has been declared as a reference.
Additionally, for a reference, it's not obvious that the object its referring to does not change. A const reference only implies that you won't change the object, not that nobody will change the object(*). A programmer would have to know that this reference is the only way of referring to that object, by looking up the definition of that variable.
(*) Disclaimer: try to avoid unapparent side effects
I don't understand what you want to achieve. The reason that T const& can be bound (on the stack) to a T (by value) which is returned from a function is to make it possible other function can take this temporary as an T const& argument. This prevents you from requirement to create overloads. But the returned value has to be constructed anyway.
But today (with C++11) you can use const auto myClass = GetMyClass();.
Edit:
As an excample of what can happen I will present something:
MyClass version_a();
MyClass const& version_b();
const MyClass var1 =version_a();
const MyClass var2 =version_b();
const MyClass var3&=version_a();
const MyClass var4&=version_b();
const auto var5 =version_a();
const auto var6 =version_b();
var1 is initialised with the result of version_a()
var2 is initialised with a copy of the object to which the reference returned by version_b() belongs
var3 holds a const reference to to the temoprary which is returned and extends its lifetime
var4 is initialised with the reference returned from version_b()
var5 same as var1
var6 same as var4
They are semanticall all different. var3 works for the reason I gave above. Only var5 and var6 store automatically what is returned.
there is a major implication regarding the destructor actually being called. Check Gotw88, Q3 and A3. I put everything in a small test program (Visual-C++, so forgive the stdafx.h)
// Gotw88.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <iostream>
class A
{
protected:
bool m_destroyed;
public:
A() : m_destroyed(false) {}
~A()
{
if (!m_destroyed)
{
std::cout<<"A destroyed"<<std::endl;
m_destroyed=true;
}
}
};
class B : public A
{
public:
~B()
{
if (!m_destroyed)
{
std::cout<<"B destroyed"<<std::endl;
m_destroyed=true;
}
}
};
B CreateB()
{
return B();
}
int _tmain(int argc, _TCHAR* argv[])
{
std::cout<<"Reference"<<std::endl;
{
const A& tmpRef = CreateB();
}
std::cout<<"Value"<<std::endl;
{
A tmpVal = CreateB();
}
return 0;
}
The output of this little program is the following:
Reference
B destroyed
Value
B destroyed
A destroyed
Here a small explanation for the setup. B is derived from A, but both have no virtual destructor (I know this is a WTF, but here it's important). CreateB() returns B by value. Main now calls CreateB and first stores the result of this call in a const reference of type A. Then CreateB is called and the result is stored in a value of type A.
The result is interesting. First - if you store by reference, the correct destructor is called (B), if you store by value, the wrong one is called. Second - if you store in a reference, the destructor is called only once, this means there is only one object. By value results in 2 calls (to different destructors), which means there are 2 objects.
My advice - use the const reference. At least on Visual C++ it results in less copying. If you are unsure about your compiler, use and adapt this test program to check the compiler. How to adapt? Add copy / move constructor and copy-assignment operator.
I quickly added copy & assignment operators for class A & B
A(const A& rhs)
{
std::cout<<"A copy constructed"<<std::endl;
}
A& operator=(const A& rhs)
{
std::cout<<"A copy assigned"<<std::endl;
}
(same for B, just replace every capital A with B)
this results in the following output:
Reference
A constructed
B constructed
B destroyed
Value
A constructed
B constructed
A copy constructed
B destroyed
A destroyed
This confirms the results from above (please note, the A constructed results from B being constructed as B is derived from A and thus As constructor is called whenever Bs constructor is called).
Additional tests: Visual C++ accepts also the non-const reference with the same result (in this example) as the const reference. Additionally, if you use auto as type, the correct destructor is called (of course) and the return value optimization kicks in and in the end it's the same result as the const reference (but of course, auto has type B and not A).
Even if the subject was discussed many times around here, I can't find a conclusive explanation regarding my particular case. Will const extend the lifetime of the RefTest temporary? Is the below example legal?
#include <iostream>
class RefTest
{
public:
RefTest(const std::string &input) : str(input) {}
~RefTest () {std::cout << "RefTest" << std::endl;}
private:
std::string str;
};
class Child
{
public:
Child (const RefTest &ref) : ref_m(ref) {}
~Child () {std::cout << "Test" << std::endl;}
private:
const RefTest &ref_m;
};
class Test
{
public:
Test () : child(RefTest("child")) {}//Will the temporary get destroyed here?
~Test () {std::cout << "Test" << std::endl;}
private:
const Child child;
};
int main ()
{
Test test;
}
The reference does not extend the lifetime. The code is legal, but only because you never access ref_m after the constructor finishes.
The temporary is bound to the constructor parameter, ref. Binding another reference to it later, ref_m, doesn't extend the lifetime. If it did, you'd have an object on the stack which has to persist as long as the reference member it's bound to, which could be allocated on the heap, so the compiler would be unable to unwind the stack when the constructor returns.
It would be nice to get a warning, but compilers aren't perfect and some things are difficult to warn about. The temporary is created in a different context from where it's bound to a reference, so the compiler can only tell there's a problem with inlinging turned on, or some clever static analysis.
The C++ standard states:
The second context is when a reference is bound to a temporary. The
temporary to which the reference is bound or the temporary that is the
complete object to a subobject of which the temporary is bound
persists for the lifetime of the reference except as specified below.
A temporary bound to a reference member in a constructor’s
ctor-initializer (12.6.2) persists until the constructor exits. A
temporary bound to a reference parameter in a function call (5.2.2)
persists until the completion of the full expression containing the
call.
NOTE: And by the way, this is duplicate (1, 2), you should search better, next time... :)
Simple example:
struct A
{
A() : i(int()) {}
const int& i;
};
Error from gcc:
a temporary bound to 'A::i' only persists until the constructor exits
Rule from 12.2p5:
A temporary bound to a reference member in a constructor’s
ctor-initializer (12.6.2) persists until the constructor exits.
Question
Does anybody know the rationale for this rule? It would seem to me that allowing the temporary to live until reference dies would be better.
I don't think the not extending to the object lifetime needs justification. The opposite would!
The lifetime extension of a temporary extends merely to the enclosing scope, which is both natural and useful. This is because we have tight control on the lifetime of the receiving reference variable. By contrast, the class member isn't really "in scope" at all. Imagine this:
int foo();
struct Bar
{
Bar(int const &, int const &, int const &) : /* bind */ { }
int & a, & b, & c;
};
Bar * p = new Bar(foo(), foo(), foo());
It'd be nigh impossible to define a meaningful lifetime extension for the foo() temporaries.
Instead, we have the default behaviour that the lifetime of the foo() temporary extends to the end of the full-expression in which it is contained, and no further.
The int() in your constructor initialization list is on the stack.
Once that value is set, int() goes out of the scope and the reference becomes invalid.
What memory would it live in?
For it to work the way you propose, it can't be on the stack since it has to live longer than any single function call. It can't be put after struct A in memory, because it would have no way to know how A was allocated.
At best, it would have to be turned into a secret heap allocation. That would then require a corresponding secret deallocation when the destructor runs. The copy constructor and assignment operator would need secret behavior too.
I don't know if anyone ever actually considered such a way of doing it. But personally, it sounds like too much extra complexity for a rather obscure feature.