This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Lifetime of temporaries
int LegacyFunction(const char *s) {
// do something with s, like print it to standard output
// this function does NOT retain any pointer to s after it returns.
return strlen(s);
}
std::string ModernFunction() {
// do something that returns a string
return "Hello";
}
LegacyFunction(ModernFunction().c_str());
The above example could easily be rewritten to use smart pointers instead of strings; I've encountered both of these situations many times. Anyway, the above example will construct an STL string in ModernFunction, return it, then get a pointer to a C-style string inside of the string object, and then pass that pointer to the legacy function.
There is a temporary string object that exists after ModernFunction has returned. When does it go out of scope?
Is it possible for the compiler to call c_str(), destruct this temporary string object, and then pass a dangling pointer to LegacyFunction? (Remember that the string object is managing the memory that c_str() return value points to...)
If the above code is not safe, why is it not safe, and is there a better, equally concise way to write it than adding a temporary variable when making the function calls? If it's safe, why?
LegacyFunction(ModernFunction().c_str());
Destruction of copy will be after evaluation of full expression (i.e. after return from LegacyFunction).
n3337 12.2/3
Temporary objects are destroyed as the last step
in evaluating the full-expression (1.9) that (lexically) contains the point where they were created.
n3337 1.9/10
A full-expression is an expression that is not a subexpression of another expression. If a language construct
is defined to produce an implicit call of a function, a use of the language construct is considered to be an
expression for the purposes of this definition. A call to a destructor generated at the end of the lifetime of
an object other than a temporary object is an implicit full-expression. Conversions applied to the result of
an expression in order to satisfy the requirements of the language construct in which the expression appears
are also considered to be part of the full-expression.
[ Example:
struct S {
S(int i): I(i) { }
int& v() { return I; }
private:
int I;
};
S s1(1); // full-expression is call of S::S(int)
S s2 = 2; // full-expression is call of S::S(int)
void f() {
if (S(3).v()) // full-expression includes lvalue-to-rvalue and
// int to bool conversions, performed before
// temporary is deleted at end of full-expression
{ }
}
There is a temporary string object that exists after ModernFunction has returned. When does it go out of scope?
Strictly speaking, it's never in scope. Scope is a property of a name, not an object. It just so happens that automatic variables have a very close association between scope and lifetime. Objects that aren't automatic variables are different.
Temporary objects are destroyed at the end of the full-expression in which they appear, with a couple of exceptions that aren't relevant here. Anyway the special cases extend the lifetime of the temporary, they don't reduce it.
Is it possible for the compiler to call c_str(), destruct this temporary string object, and then pass a dangling pointer to LegacyFunction
No, because the full-expression is LegacyFunction(ModernFunction().c_str()) (excluding the semi-colon: feel that pedantry), so the temporary that is the return value of ModernFunction is not destroyed until LegacyFunction has returned.
If it's safe, why?
Because the lifetime of the temporary is long enough.
In general with c_str, you have to worry about two things. First, the pointer it returns becomes invalid if the string is destroyed (which is what you're asking). Second, the pointer it returns becomes invalid if the string is modified. You haven't worried about that here, but it's OK, you don't need to, because nothing modifies the string either.
Related
Suppose I have the following code:
void some_function(std::string_view view) {
std::cout << view << '\n';
}
int main() {
some_function(std::string{"hello, world"}); // ???
}
Will view inside some_function be referring to a string which has been destroyed? I'm confused because, considering this code:
std::string_view view(std::string{"hello, world"});
Produces the warning (from clang++):
warning: object backing the pointer will be destroyed at the end of the full-expression [-Wdangling-gsl]
What's the difference?
(Strangely enough, using braces {} rather than brackets () to initialise the string_view above eliminates the warning. I've no idea why that is either.)
To be clear, I understand the above warning (the string_view outlives the string, so it holds a dangling pointer). What I'm asking is why passing a string into some_function doesn't produce the same warning.
std::string_view is nothing other than std::basic_string_view<char>, so let's see it's documentation on cppreference:
The class template basic_string_view describes an object that can refer to a constant contiguous sequence of char-like objects with the first element of the sequence at position zero.
A typical implementation holds only two members: a pointer to constant CharT and a size.
The part I have highlighted tells us why clang is right about std::string_view view(std::string{"hello, world"});: as others have commented it's because after the declaration is done, std::string{"hello, world"} is destroyed and that underlying pointer that the std::string_view holds dangles.
Clearly that's just a typical implementation, but since we know it is correct, it tells us at least that the standard doesn't require any implmentation to do something special to keep temporaries alive.
some_function(std::string{"hello, world"}); is completely safe, as long as the function doesn't preserve the string_view for later use.
The temporary std::string is destroyed at the end of this full-expression (roughly speaking, at this ;), so it's destroyed after the function returns.
std::string_view view(std::string{"hello, world"}); always produces a dangling string_view, regardless of whether you use () or {}. If the choice of brackets affects compiler warnings, it's a compiler defect.
Is it safe to pass an std::string temporary into an std::string_view parameter?
In general, it isn't necessarily safe. It depends on what the function does. If you don't know, then you shouldn't assume it to be safe.
Knowing the definition of the function as shown, it is safe to call the example function with a temporary string.
Will view inside some_function be referring to a string which has been destroyed?
Not in this case, because the temporary argument string - which the string view refers to - hasn't been destroyed.
What's the difference?
The parameter of the function has shorter lifetime than the lifetime of the temporary passed as the argument. The lifetime of the string view variable is longer than the lifetime of the temporary argument passed to the constructor.
Just as others have said, some_function(std::string{"hello, world"}); is totally safe since it passes it by value and stays in scope until the function ends. If safety is all you are concerned with, that will do, if performance could be an issue, I'll recommend using an rvalue reference here like so:
void some_function(std::string_view&& view)
{
std::cout << "rval reference: " << view << '\n';
}
int main()
{
some_function(std::string{"hello, world"});
}
R-value references are great if you are going to use some_function() mainly for temporary values.
I've just been thinking about the following bit of code:
PerformConflict(m_dwSession,
CONFLICT_DETECTED,
item.GetConflictedFile().GetUnNormalizedPath().c_str(),
item.GetSuggestedFile().GetUnNormalizedPath().c_str());
GetConflictFile() returns an object.
GetUnNormalizedPath()
returns a std::wstring
c_str() just returns a const wchar_t* (in this case to the contents of an rvalue std::wstring)
My question is: Does anything in the spec guarantee that this code is safe? I.e. are all the rvalue objects guaranteed not to have been destroyed by the time that c_str() is getting a pointer to their contents?
Those temporaries will be destroyed at the end of the full expression they appear in. In your case, that's the entire snippet you posted.
This will be absolutely fine, so long as you only use that const wchar_t* inside that function invocation. If you store it anywhere and try to access it after the call exits, you would be thrust down the deep dark hole of UB.
The relevant standards quote is (emphasis mine):
N3337 [class.temporary]/3:
When an implementation introduces a temporary object of a class that has a non-trivial constructor (12.1,
12.8), it shall ensure that a constructor is called for the temporary object. Similarly, the destructor shall be
called for a temporary with a non-trivial destructor (12.4). Temporary objects are destroyed as the last step
in evaluating the full-expression (1.9) that (lexically) contains the point where they were created. This is true
even if that evaluation ends in throwing an exception. The value computations and side ef f ects of destroying
a temporary object are associated only with the full-expression, not with any specific subexpression.
As illustrated by Herb Sutter, rvalues are destroyed at the end of the expression in which they appear. However, if you bind them to "a reference to const on the stack", their lifetime is extended to that of the reference.
So, basically, if your function has this kind of signature:
PerformConflict(...,
...,
const std::string& str1, //< any rvalue passed here will have the same lifetime as str1
const std::string& str2 //< any rvalue passed here will have the same lifetime as str2
);
You should be able to manipulate the strings inside PerformConflict() without problems.
PS: the problem can also be solved if you pass the arguments by value (i.e. const std::string str1)
In 12.2 of C++11 standard:
The temporary to which the reference is bound or the temporary that is
the complete object of a subobject to which the reference is bound
persists for the lifetime of the reference except:
A temporary bound
to a reference member in a constructor’s ctor-initializer (12.6.2)
persists until the constructor exits.
A temporary bound to a
reference parameter in a function call (5.2.2) persists until the
completion of the full-expression containing the call.
The lifetime
of a temporary bound to the returned value in a function return
statement (6.6.3) is not extended; the temporary is destroyed at the
end of the full-expression in the return statement.
A temporary
bound to a reference in a new-initializer (5.3.4) persists until the
completion of the full-expression containing the new-initializer.
And there is an example of the last case in the standard:
struct S {
int mi;
const std::pair<int,int>& mp;
};
S a { 1,{2,3} }; // No problem.
S* p = new S{ 1, {2,3} }; // Creates dangling reference
To me, 2. and 3. make sense and easy to agree. But what's the reason bebind 1. and 4.? The example looks just evil to me.
As with many things in C and C++, I think this boils down to what can be reasonably (and efficiently) implemented.
Temporaries are generally allocated on the stack, and code to call their constructors and destructors are emitted into the function itself. So if we expand your first example into what the compiler is actually doing, it would look something like:
struct S {
int mi;
const std::pair<int,int>& mp;
};
// Case 1:
std::pair<int,int> tmp{ 2, 3 };
S a { 1, tmp };
The compiler can easily extend the life of the tmp temporary long enough to keep "S" valid because we know that "S" will be destroyed before the end of the function.
But this doesn't work in the "new S" case:
struct S {
int mi;
const std::pair<int,int>& mp;
};
// Case 2:
std::pair<int,int> tmp{ 2, 3 };
// Whoops, this heap object will outlive the stack-allocated
// temporary!
S* p = new S{ 1, tmp };
To avoid the dangling reference, we would need to allocate the temporary on the heap instead of the stack, something like:
// Case 2a -- compiler tries to be clever?
// Note that the compiler won't actually do this.
std::pair<int,int> tmp = new std::pair<int,int>{ 2, 3 };
S* p = new S{ 1, tmp };
But then a corresponding delete p would need to free this heap memory! This is quite contrary to the behavior of references, and would break anything that uses normal reference semantics:
// No way to implement this that satisfies case 2a but doesn't
// break normal reference semantics.
delete p;
So the answer to your question is: the rules are defined that way because it sort of the only practical solution given C++'s semantics around the stack, heap, and object lifetimes.
WARNING: #Potatoswatter notes below that this doesn't seem to be implemented consistently across C++ compilers, and therefore is non-portable at best for now. See his example for how Clang doesn't do what the standard seems to mandate here. He also says that the situation "may be more dire than that" -- I don't know exactly what this means, but it appears that in practice this case in C++ has some uncertainty surrounding it.
The main thrust is that reference extension only occurs when the lifetime can be easily and deterministically determined, and this fact can be deduced as possible on the line of code where the temporary is created.
When you call a function, it is extended to the end of the current line. That is long enough, and easy to determine.
When you create an automatic storage reference "on the stack", the scope of that automatic storage reference can be deterministically determined. The temporary can be cleaned up at that point. (Basically, create an anonymous automatic storage variable to store the temporary)
In a new expression, the point of destruction cannot be statically determined at the point of creation. It is whenever the delete occurs. If we wanted the delete to (sometimes) destroy the temporary, then our reference "binary" implementation would have to be more complicated than a pointer, instead of less or equal. It would sometimes own the referred to data, and sometimes not. So that is a pointer, plus a bool. And in C++ you don't pay for what you don't use.
The same holds in a constructor, because you cannot know if the constructor was in a new or a stack allocation. So any lifetime extension cannot be statically understood at the line in question.
How long do you want the temporary object to last? It has to be allocated somewhere.
It can't be on the heap because it would leak; there is no applicable automatic memory management. It can't be static because there can be more than one. It must be on the stack. Then it either lasts until the end of the expression or the end of the function.
Other temporaries in the expression, perhaps bound to function call parameters, are destroyed at the end of the expression, and persisting until the end of the function or "{}" scope would be an exception to the general rules. So by deduction and extrapolation of the other cases, the full-expression is the most reasonable lifetime.
I'm not sure why you say this is no problem:
S a { 1,{2,3} }; // No problem.
The dangling reference is the same whether or not you use new.
Instrumenting your program and running it in Clang produces these results:
#include <iostream>
struct noisy {
int n;
~noisy() { std::cout << "destroy " << n << "\n"; }
};
struct s {
noisy const & r;
};
int main() {
std::cout << "create 1 on stack\n";
s a {noisy{ 1 }}; // Temporary created and destroyed.
std::cout << "create 2 on heap\n";
s* p = new s{noisy{ 2 }}; // Creates dangling reference
}
create 1 on stack
destroy 1
create 2 on heap
destroy 2
The object bound to the class member reference does not have an extended lifetime.
Actually I'm sure this is the subject of a known defect in the standard, but I don't have time to delve in right now…
Considering :
class MyObject{
public:
MyObject();
MyObject(int,int);
int x;
int y;
MyObject operator =(MyObject rhs);
};
MyObject::MyObject(int xp, int yp){
x = xp;
y = yp;
}
MyObject MyObject::operator =(MyObject rhs){
MyObject temp;
temp.x = rhs.x;
temp.y = rhs.y;
return temp;
}
int main(){
MyObject one(1,1);
MyObject two(2,2);
MyObject three(3,3);
one = two = three;
cout << one.x << ", " << one.y;
cout << two.x << ", " << two.y;
cout << three.x << ", " << three.y;
}
By doing this, the variables x and y in one,two and three are unchanged. I know that I should update the member variables for MyObject and use return by reference and return *this for proper behaviour. However, what actually happens to the return values in one = two = three ? Where does the return temp actually end up in the chain, like step by step ?
The call to the assignment operator in
two = three
returns a temporary object as rvalue. This is of type MyObject and is passed on to the next call of the assignment operator
one = t
(I use t to refer to the temporary object.)
Unfortunately, this won't compile because the assignment operator expects a reference MyObject&, and not an rvalue of type MyObject.
(Your code won't compile for various reasons, including uppercased Class and typos, too.)
However, if you were to define an assignment operator that takes an rvalue (i.e. takes the argument by value, const-reference, or indeed by rvalue reference MyObject&& if C++11 is used), the call would work and the temporary object would be copied into the function. Internally, assignments would be made and another temporary object would be returned.
The final temporary object would then go out of scope, i.e. cease to exist. There would be no way to access its contents.
Thanks for Joachim Pileborg and Benjamin Lindley for the helpful comments.
To answer the request for more details: MyObject is a class type, and the C++ Standard includes an entire section on the life cycle of temporary objects of class type (Section 12.2). There are various complex situations that are detailed there in length, and I won't explain them all. But the basic concepts are as follows:
C++ has the notion of expressions. Expressions are, along with declarations and statements, the basic units the code of the program is composed of. For example, a function call f(a,b,c) is an expression, or an assignment like a = b. Expressions may contain other expressions: a = f(b,c), a function call nested in an assignment expression. C++ also introduces the concept of full-expressions. In the previous example, c is part of the expression f(b,c), but also of a = f(b,c), and if that is not nested in another expression, we say that a = f(b,c) is the full-expressions that lexically contains c.
The Standard defines a variety of situations where temporary objects may be created. One such situation is the returning of an object by value from a function call (aka returning a prvalue, §6.6.3).
The Standard states that the life time of such a temporary object ends when the full-expression that contains it has been fully evaluated:
[...] Temporary objects are destroyed as the last step in evaluating the full-expression (1.9) that (lexically) contains the point where they were created. [...]
(Note. The Standard then goes on to define several exceptions to this rule. The case of the return value of your assignment operator, however, is not such an exception.)
Now, what does it mean that an object (of class type) is destroyed? It means, first and foremost, that its destructor is called (§12.2/3). It also means that the storage for that object can no longer be safely accessed. So if you somehow managed to store the address of the temporary object in a pointer before the evaluation of the full-expression has ended, then derefencing that pointer after the evaluation has ended generally causes undefined behaviour.
In practice, this may in many cases mean the following – I describe the entire life cycle of the temporary object in one possible scenario:
To provide for storage for the temporary, the compiler makes sure that sufficient stack space is allocated when the function that contains the full-expression is entered (this happens before the full-expression is actually evaluated).
During the evaluation of the assignment expression, the temporary is created. The compiler makes sure that its constructor is called to initialise the space that was allocated for it.
Then the contents of the temporary may be accessed or modified in the course of the evaluation of the full-expression it is part of.
When the expression has been fully evaluated (in your case, this moment corresponds to the end of the line that contains the assignment expression), the destructor for the temporary is called. After that it is no longer safe to access the memory that was allocated for it, although in reality that space will continue to be part of the current stack frame until the evaluation of the function in which all of this happens has finished.
But, again, this is only an example of what may happen. The creation of temporaries is in many situations not actually required. The compiler may perform optimizations that mean the temporary is never actually created. In this case, the compiler must nevertheless ensure that it could have been be created, e.g. it must ensure that the required constructors and destructors exist (they may never be called though).
If I have the following code:
{
UnicodeString sFish = L"FISH";
char *szFish = AnsiString(sFish).c_str();
CallFunc(szFish);
}
Then what is the scope of the temporary AnsiString that's created, and for how long is szFish pointing to valid data? Will it still be valid for the CallFunc function?
Will it's scope last just the one line, or for the whole block?
szFish is invalid before the call to CallFunc(), because AnsiString is a temporary object that is destructed immediately and szFish is pointing to its internal buffer which will have just been deleted.
Ensure that the AnsiString instance is valid for the invocation of CallFunc(). For example:
CallFunc(AnsiString(sFish).c_str());
I would replace:
char *szFish = AnsiString(sFish).c_str();
with:
AnsiString as(sFish);
char *szFish = as.c_str();
I don't know the AnsiString class but in your code its destructor will fire before your call to CallFunc(), and will most probably release the string you point to with *szFish. When you replace the temporary object with a "named" object on stack its lifetime will extend until the end of the block it is defined in.
The C++11 standard $12.2.3 says:
When an implementation introduces a temporary object of a class that
has a non-trivial constructor (12.1, 12.8), it shall ensure that a
constructor is called for the temporary object. Similarly, the
destructor shall be called for a temporary with a non-trivial
destructor (12.4). Temporary objects are destroyed as the last step in
evaluating the full-expression (1.9) that (lexically) contains the
point where they were created. This is true even if that evaluation
ends in throwing an exception. The value computations and side effects
of destroying a temporary object are associated only with the
full-expression, not with any specific subexpression.
(emphasis mine)
There are additional caveats to this, but they don't apply in this situation. In your case the full expression is the indicated part of this statement:
char *szFish = AnsiString(sFish).c_str();
// ^^^^^^^^^^^^^^^^^^^^^^^^^
So, the instant szFish is assigned, the destructor of your temporary object (i.e. AnsiString(sFish)) will be called and its internal memory representation (where c_str() points to) will be released. Thus, szFish will be immediately become a dangling pointer and any access will fail.
You can get around this by saying
CallFunc(AnsiString(sFish).c_str());
instead, as here, the temporary will be destroyed (again) after the full expression (that is, right at the ;) and CallFunc will be able to read the raw string.
The scope of the AnsiString in this case is "from right before the call to c_str(), until right after."
It may help to think of it this way:
char *szFish;
{
AnsiString tmpString(sFish);
szFish = tmpString.c_str();
}