I'm reviewing a collegue's code, and I see he has several constants defined in the global scope as:
const string& SomeConstant = "This is some constant text";
Personally, this smells bad to me because the reference is referring to what I'm assuming is an "anonymous" object constructed from the given char array.
Syntactically, it's legal (at least in VC++ 7), and it seems to run, but really I'd rather have him remove the & so there's no ambiguity as to what it's doing.
So, is this TRULY safe and legal and I'm obsessing? Does the temp object being constructed have a guaranteed lifetime? I had always assumed anonymous objects used in this manner were destructed after use...
So my question could also be generalized to anonymous object lifetime. Does the standard dictate the lifetime of an anonymous object? Would it have the same lifetime as any other object in that same scope? Or is it only given the lifetime of the expression?
Also, when doing it as a local, it's obviously scoped differently:
class A
{
string _str;
public:
A(const string& str) :
_str(str)
{
cout << "Constructing A(" << _str << ")" << endl;
}
~A()
{
cout << "Destructing A(" << _str << ")" << endl;
}
};
void TestFun()
{
A("Outer");
cout << "Hi" << endl;
}
Shows:
Constructing A(Outer);
Destructing A(Outer);
Hi
It's completely legal. It will not be destructed until the program ends.
EDIT: Yes, it's guaranteed:
"All objects which do not have dynamic
storage duration, do not have thread
storage duration, and are not local
have static storage duration. The
storage for these objects shall last
for the duration of the program
(3.6.2, 3.6.3)."
-- 2008 Working Draft, Standard for Programming Language C++, § 3.7.1 p. 63
As Martin noted, this is not the whole answer. The standard draft further notes (§ 12.2, p. 250-1):
"Temporaries of class type are created
in various contexts: binding an rvalue
to a reference (8.5.3) [...] Even when
the creation of the temporary object
is avoided (12.8), all the semantic
restrictions shall be respected as if
the temporary object had been created.
[...] Temporary objects are destroyed
as the last step in evaluating the
full-expression (1.9) that (lexically)
contains the point where they were
created. [...] There are two contexts
in which temporaries are destroyed at
a different point than the end of the
full-expression. [...] The second
context is when a reference is bound
to a temporary. The temporary to which
the reference is bound or the
temporary that is the complete object
of a subobject to which the reference
is bound persists for the lifetime of
the reference except as specified
below."
I tested in g++ if that makes you feel any better. ;)
Yes it is valid and legal.
const string& SomeConstant = "This is some constant text";
// Is equivalent too:
const string& SomeConstant = std::string("This is some constant text");
Thus you are creating a temporary object.
This temporary object is bound to a const& and thus has its lifetime extended to the lifespan of the variable it is bound too (ie longer than the expression in which it was created).
This is guranteed by the standard.
Note:
Though it is legal. I would not use it. The easist solution would be to convert it into a const std::string.
Usage:
In this situation because the variable is in global scope it is valid for the full length of the program. So it can be used as soon as execution enters main() and should not be accessed after executiuon exits main().
Though it technically may be avilable before this your usage of it in constructors/destructors of global objects should be tempered with the known problem of global variable initialization order.
Extra Thoughts:
This on the other hand will not suffer from the problem:
char const* SomeConstant = "This is some constant text";
And can be used at any point. Just a thought.
It might be legal, but still ugly. Leave out the reference !
const string SomeConstant = "This is some constant text";
It's as legal as it's ugly.
It's legal to extend a temporary variable with a const reference, this is used by Alexandrescu's ScopeGaurd see this excellent explanation by Herb Sutter called A candidate for the "Most important const".
That being said this specific case is an abuse of this feature of C++ and the reference should be removed leaving a plain const string.
By declaring it as const (which means it can't be changed) and then making it a reference, which implies that someone might change it, seems like bad form, at the very least. Plus, as I am sure you understand, global variables are BAD, and rarely necessary.
Okay, folks correct me if I'm off the deep end, but here's my conclusions listening to all of your excellent responses:
A) it is syntactically and logically legal, the & extends the lifetime of the temp/anonymous from beyond expression level to the life of the reference. I verified this in VC++7 with:
class A {
public: A() { cout << "constructing A" << endl; }
public: ~A() { cout << "destructing A" << endl; }
};
void Foo()
{
A();
cout << "Foo" << endl;
}
void Bar()
{
const A& someA = A();
cout << "Bar" << endl;
}
int main()
{
Foo(); // outputs constructing A, destructing A, Foo
Bar(); // outputs constructing A, Bar, destructing A
return 0;
}
B) Though it is legal, it can lead to some confusion as to the actual lifetime and the reference in these cases give you no benefit of declaring it as a non-reference, thus the reference should probably be avoided and may even be extra space. Since there's no benefit to it, it's unnecessary obfuscation.
Thanks for all the answers it was a very interesting dicussion. So the long and short of it: Yes, it's syntactically legal, no it's not technically dangerous as the lifetime is extended, but it adds nothing and may add cost and confusion, so why bother.
Sound right?
Related
Suppose I have the following code:
void some_function(std::string_view view) {
std::cout << view << '\n';
}
int main() {
some_function(std::string{"hello, world"}); // ???
}
Will view inside some_function be referring to a string which has been destroyed? I'm confused because, considering this code:
std::string_view view(std::string{"hello, world"});
Produces the warning (from clang++):
warning: object backing the pointer will be destroyed at the end of the full-expression [-Wdangling-gsl]
What's the difference?
(Strangely enough, using braces {} rather than brackets () to initialise the string_view above eliminates the warning. I've no idea why that is either.)
To be clear, I understand the above warning (the string_view outlives the string, so it holds a dangling pointer). What I'm asking is why passing a string into some_function doesn't produce the same warning.
std::string_view is nothing other than std::basic_string_view<char>, so let's see it's documentation on cppreference:
The class template basic_string_view describes an object that can refer to a constant contiguous sequence of char-like objects with the first element of the sequence at position zero.
A typical implementation holds only two members: a pointer to constant CharT and a size.
The part I have highlighted tells us why clang is right about std::string_view view(std::string{"hello, world"});: as others have commented it's because after the declaration is done, std::string{"hello, world"} is destroyed and that underlying pointer that the std::string_view holds dangles.
Clearly that's just a typical implementation, but since we know it is correct, it tells us at least that the standard doesn't require any implmentation to do something special to keep temporaries alive.
some_function(std::string{"hello, world"}); is completely safe, as long as the function doesn't preserve the string_view for later use.
The temporary std::string is destroyed at the end of this full-expression (roughly speaking, at this ;), so it's destroyed after the function returns.
std::string_view view(std::string{"hello, world"}); always produces a dangling string_view, regardless of whether you use () or {}. If the choice of brackets affects compiler warnings, it's a compiler defect.
Is it safe to pass an std::string temporary into an std::string_view parameter?
In general, it isn't necessarily safe. It depends on what the function does. If you don't know, then you shouldn't assume it to be safe.
Knowing the definition of the function as shown, it is safe to call the example function with a temporary string.
Will view inside some_function be referring to a string which has been destroyed?
Not in this case, because the temporary argument string - which the string view refers to - hasn't been destroyed.
What's the difference?
The parameter of the function has shorter lifetime than the lifetime of the temporary passed as the argument. The lifetime of the string view variable is longer than the lifetime of the temporary argument passed to the constructor.
Just as others have said, some_function(std::string{"hello, world"}); is totally safe since it passes it by value and stays in scope until the function ends. If safety is all you are concerned with, that will do, if performance could be an issue, I'll recommend using an rvalue reference here like so:
void some_function(std::string_view&& view)
{
std::cout << "rval reference: " << view << '\n';
}
int main()
{
some_function(std::string{"hello, world"});
}
R-value references are great if you are going to use some_function() mainly for temporary values.
I wrote the following program and expected that the rvalue gotten from std::move() would be destroyed right after it's used in the function call:
struct A
{
A(){ }
A(const A&){ std::cout << "A&" << std::endl; }
~A(){ std::cout << "~A()" << std::endl; }
A operator=(const A&){ std::cout << "operator=" << std::endl; return A();}
};
void foo(const A&&){ std::cout << "foo()" << std::endl; }
int main(){
const A& a = A();
foo(std::move(a)); //after evaluation the full-expression
//rvalue should have been destroyed
std::cout << "before ending the program" << std::endl;
}
But it was not. The following output was produced instead:
foo()
before ending the program
~A()
DEMO
As said in the answer
rvalues denote temporary objects which are destroyed at the next
semicolon
What did I get wrong?
std::move does not make a into a temporary value. Rather it creates an rvalue reference to a, which is used in function foo. In this case std::move is not doing anything for you.
The point of std::move is that you can indicate that a move constructor should be used instead of a copy constructor, or that a function being called is free to modify the object in a destructive way. It doesn't automatically cause your object to be destructed.
So what std::move does here is that if it wanted to, the function foo could modify a in a destructive way (since it takes an rvalue reference as its argument). But a is still an lvalue. Only the reference is an rvalue.
There's a great reference here that explains rvalue references in detail, perhaps that will clear a few things up.
Remember: std::move doesn't move the object. std::move is a simple cast that takes an lvalue and makes it look like an rvalue
foo, by taking an argument by rvalue reference, says that the input object will be modified, but left in a valid state. Nothing here about destroying the object.
In the end, a remains an lvalue, no matter how much you try to cast it.
std::move(a) does not alter a to become an rvalue.
Instead, it creates an rvalue reference to a.
Edit:
Note that, with your line
const A& a = A();
you rely on a the special case of local const references prolonging the lives of temporaries (see e.g. http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/). This feature predates C++11. As temporaries are basically rvalues, I understand now where your confusion comes from.
Note that by prolonging the life of the temporary (by means of assigning it to the local const reference), your object referenced by a cannot be classified as "object(s) which are destroyed at the next semicolon". Rather, it lives as long as the reference.
May someone else locate http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/ in the C++11 standard. Same for your sentence "rvalues denote temporary objects which are destroyed at the next semicolon".
The rvalue you get from std::move is an rvalue reference, and references don't have a destructor. You can't get to that reference anymore. So why don't you think it has been destroyed?
You didn't get anything wrong; rather, it is the committee that made a mistake in giving std::move that name. It's just a cast, making it easier to "select" move constructors and move assignment operators that actually perform moves. In this case you have neither, so the std::move does literally nothing. You're seeing the original object being destroyed when it goes out of scope as usual.
According to Herb Sutter's article http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/, the following code is correct:
#include <iostream>
#include <vector>
using namespace std;
vector<vector<int>> f() { return {{1},{2},{3},{4},{5}}; }
int main()
{
const auto& v = f();
cout << v[3][0] << endl;
}
i.e. the lifetime of v is extended to the lifetime of the v const reference.
And indeed this compiles fine with gcc and clang and runs without leaks according to valgrind.
However, when I change the main function thusly:
int main()
{
const auto& v = f()[3];
cout << v[0] << endl;
}
it still compiles but valgrind warns me of invalid reads in the second line of the function due to the fact that the memory was free'd in the first line.
Is this standard compliant behaviour or could this be a bug in both g++ (4.7.2) and clang (3.5.0-1~exp1)?
If it is standard compliant, it seems pretty weird to me... oh well.
There's no bug here except in your code.
The first example works because, when you bind the result of f() to v, you extend the lifetime of that result.
In the second example you don't bind the result of f() to anything, so its lifetime is not extended. Binding to a subobject of it would count:
[C++11: 12.2/5]: The second context is when a reference is bound to a temporary. The temporary to which the reference is bound or the temporary that is the complete object of a subobject to which the reference is bound persists for the lifetime of the reference except: [..]
…but you're not doing that: you're binding to the result of calling a member function (e.g. operator[]) on the object, and that result is not a data member of the vector!
(Notably, if you had an std::array rather than an std::vector, then the code† would be absolutely fine as array data is stored locally, so elements are subobjects.)
So, you have a dangling reference to a logical element of the original result of f() which has long gone out of scope.
† Sorry for the horrid initializers but, well, blame C++.
In 12.2 of C++11 standard:
The temporary to which the reference is bound or the temporary that is
the complete object of a subobject to which the reference is bound
persists for the lifetime of the reference except:
A temporary bound
to a reference member in a constructor’s ctor-initializer (12.6.2)
persists until the constructor exits.
A temporary bound to a
reference parameter in a function call (5.2.2) persists until the
completion of the full-expression containing the call.
The lifetime
of a temporary bound to the returned value in a function return
statement (6.6.3) is not extended; the temporary is destroyed at the
end of the full-expression in the return statement.
A temporary
bound to a reference in a new-initializer (5.3.4) persists until the
completion of the full-expression containing the new-initializer.
And there is an example of the last case in the standard:
struct S {
int mi;
const std::pair<int,int>& mp;
};
S a { 1,{2,3} }; // No problem.
S* p = new S{ 1, {2,3} }; // Creates dangling reference
To me, 2. and 3. make sense and easy to agree. But what's the reason bebind 1. and 4.? The example looks just evil to me.
As with many things in C and C++, I think this boils down to what can be reasonably (and efficiently) implemented.
Temporaries are generally allocated on the stack, and code to call their constructors and destructors are emitted into the function itself. So if we expand your first example into what the compiler is actually doing, it would look something like:
struct S {
int mi;
const std::pair<int,int>& mp;
};
// Case 1:
std::pair<int,int> tmp{ 2, 3 };
S a { 1, tmp };
The compiler can easily extend the life of the tmp temporary long enough to keep "S" valid because we know that "S" will be destroyed before the end of the function.
But this doesn't work in the "new S" case:
struct S {
int mi;
const std::pair<int,int>& mp;
};
// Case 2:
std::pair<int,int> tmp{ 2, 3 };
// Whoops, this heap object will outlive the stack-allocated
// temporary!
S* p = new S{ 1, tmp };
To avoid the dangling reference, we would need to allocate the temporary on the heap instead of the stack, something like:
// Case 2a -- compiler tries to be clever?
// Note that the compiler won't actually do this.
std::pair<int,int> tmp = new std::pair<int,int>{ 2, 3 };
S* p = new S{ 1, tmp };
But then a corresponding delete p would need to free this heap memory! This is quite contrary to the behavior of references, and would break anything that uses normal reference semantics:
// No way to implement this that satisfies case 2a but doesn't
// break normal reference semantics.
delete p;
So the answer to your question is: the rules are defined that way because it sort of the only practical solution given C++'s semantics around the stack, heap, and object lifetimes.
WARNING: #Potatoswatter notes below that this doesn't seem to be implemented consistently across C++ compilers, and therefore is non-portable at best for now. See his example for how Clang doesn't do what the standard seems to mandate here. He also says that the situation "may be more dire than that" -- I don't know exactly what this means, but it appears that in practice this case in C++ has some uncertainty surrounding it.
The main thrust is that reference extension only occurs when the lifetime can be easily and deterministically determined, and this fact can be deduced as possible on the line of code where the temporary is created.
When you call a function, it is extended to the end of the current line. That is long enough, and easy to determine.
When you create an automatic storage reference "on the stack", the scope of that automatic storage reference can be deterministically determined. The temporary can be cleaned up at that point. (Basically, create an anonymous automatic storage variable to store the temporary)
In a new expression, the point of destruction cannot be statically determined at the point of creation. It is whenever the delete occurs. If we wanted the delete to (sometimes) destroy the temporary, then our reference "binary" implementation would have to be more complicated than a pointer, instead of less or equal. It would sometimes own the referred to data, and sometimes not. So that is a pointer, plus a bool. And in C++ you don't pay for what you don't use.
The same holds in a constructor, because you cannot know if the constructor was in a new or a stack allocation. So any lifetime extension cannot be statically understood at the line in question.
How long do you want the temporary object to last? It has to be allocated somewhere.
It can't be on the heap because it would leak; there is no applicable automatic memory management. It can't be static because there can be more than one. It must be on the stack. Then it either lasts until the end of the expression or the end of the function.
Other temporaries in the expression, perhaps bound to function call parameters, are destroyed at the end of the expression, and persisting until the end of the function or "{}" scope would be an exception to the general rules. So by deduction and extrapolation of the other cases, the full-expression is the most reasonable lifetime.
I'm not sure why you say this is no problem:
S a { 1,{2,3} }; // No problem.
The dangling reference is the same whether or not you use new.
Instrumenting your program and running it in Clang produces these results:
#include <iostream>
struct noisy {
int n;
~noisy() { std::cout << "destroy " << n << "\n"; }
};
struct s {
noisy const & r;
};
int main() {
std::cout << "create 1 on stack\n";
s a {noisy{ 1 }}; // Temporary created and destroyed.
std::cout << "create 2 on heap\n";
s* p = new s{noisy{ 2 }}; // Creates dangling reference
}
create 1 on stack
destroy 1
create 2 on heap
destroy 2
The object bound to the class member reference does not have an extended lifetime.
Actually I'm sure this is the subject of a known defect in the standard, but I don't have time to delve in right now…
C++11 §12.1/14:
During the construction of a const object, if the value of the object or any of its subobjects is accessed through an lvalue that is
not obtained, directly or indirectly, from the constructor’s this
pointer, the value of the object or subobject thus obtained is
unspecified. [Example:
struct C;
void no_opt(C*);
struct C {
int c;
C() : c(0) { no_opt(this); }
};
const C cobj;
void no_opt(C* cptr) {
// value of cobj.c is unspecified
int i = cobj.c * 100;
cptr->c = 1;
// value of cobj.c is unspecified
cout << cobj.c * 100 << '\n';
}
Compiling the above example outputs 100. My question is why is the value of cobj.c should be unspecified when the initialization list sets it to 0 before entering constructor? How is this behavior different in case if a non-const object is used?
Genuinely const objects may be treated by the compiler as legitimate constants. It can assume their values never change or even store them in const memory, e.g. ROM or Flash. So, you need to use the non-const access path provided by this as long as the object is, in fact, not constant. This condition only exists during object construction and destruction.
Offhand, I think there does not need to be a corresponding requirement for destructors because the object lifetime has already ended and cobj.c is inaccessible as soon as the destructor for cobj begins.
As Matthieu mentions, it is a strong "code smell" to be accessing an object besides through this during construction or destruction. Reviewing C++11 §3.8 [basic.life] ¶1 and 6, it would appear that cobj.c inside the constructor is UB for the same reason it is inside the destructor, regardless of the object being const or §12.1/14, because its lifetime does not begin until initialization is complete (the constructor returns).
It might be likely to work, but it will ring alarms for good C++ programmers, and by the book it is illegal.
The reason for the quoted rule is to allow the compiler to make
optimizations based on the const-ness of the object. For example,
depending on optimization, your compiler might replace the second
cobj.c * 100 in no_opt with i. More likely, in this particular
case, the optimizer will suppress the i and its initialization
completely, so the code will appear to work. But this might not be the
case if you also output i, before changing cptr->c; it all depends
on how agressive the compiler optimizes. But the compiler is allowed to
assume that *cptr is not an alias for cobj, because cobj is a
const object, where as you modify through *cptr, so it cannot point to
a const object without undefined behavior.
If the object isn't const, of course, the issue doesn't occur; the
compiler must always take into account a possible aliasing between
*cptr and cobj.