We are in a hot discussion with my friends about the code:
#include <iostream>
#include <string>
using namespace std;
string getString() {
return string("Hello, world!");
}
int main() {
char const * str = getString().c_str();
std::cout << str << "\n";
return 0;
}
This code produces different outputs on g++, clang and vc++:
g++ and clang output is the same:
Hello, world!
However vc++ outputs nothing (or just spaces):
What behavior is correct? Is this may be a change in standard according to temporaries lifetime ?
As far as I can see by reading IR of clang++, it works as following:
store `getString()`'s return value in %1
std::cout << %1.c_str() << "\n";
destruct %1
Personally, I think gcc works this way too (I've tested it with rvo/move verbosity (custom ctors and dtors which prints to std::cout). Why does vc++ works other way?
clang = Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
g++ = gcc version 4.9.2 (Debian 4.9.2-10)
Your program has undefined behaviour! You are "printing" a dangling pointer.
The result of getString(), a temporary string, lives no longer than that const char* declaration; accordingly neither does the result of invoking c_str() on that temporary.
So both compilers are "correct"; it is you and your friends who are wrong.
This is why we shall not store the result of std::string::c_str(), unless we really, really need to.
Both are right, undefined behaviour is undefined.
char const * str = getString().c_str();
getString() returns a temporary, which will be destroyed at the end of the full expression which contains it. So after that line is finished, str is an invalid pointer and trying to inspect it will plunge you into the land of undefined behaviour.
Some standards quotes, as requested (from N4140):
[class.temporary]/3: Temporary objects are destroyed as the last step in evaluating the full-expression that (lexically) contains the point where they were created.
basic_string::c_str is specified like so:
[string.accessors]/1: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
Since strings have their contents stored contiguously ([string.require]/4) this essentially means "return a pointer to the start of the buffer".
Obviously when a std::string is destructed it will reclaim any memory which was allocated, making that pointer invalid (if your friends don't believe that, they have other problems).
That is undefined behavior so anything can happen (including printing the string "correctly").
Making things "working" anyway happens quite often with UB, unless the program is actually running on a paying customer's computer or if it's shown on the big screen in front of a vast audience ;-)
The problem is that you're taking a const char * pointing inside a temporary object that is destroyed before your use of the pointer.
Note that this is not the same situation as with:
const std::string& str = getString(); // Returns a temporary
std::cout << str << "\n";
because in this case instead there is a very specific rule about references bound to temporaries in the C++ standard. In this case the lifetime of the temporary will be extended until the reference str is also destroyed. The rule only applies to references and only if directly bound to the temporary or to a sub-object of the temporary (like const std::string& s = getObj().s;) and not to the result of calling methods of a temporary object.
Related
My question is in relation to this little code snippet:
typedef std::map<std::string, std::string> my_map_t;
std::string_view get_value_worse(const my_map_t& input_map, std::string_view value)
{
auto retrieved = input_map.find(value.data());
return retrieved != input_map.cend() ? retrieved->second : "";
}
std::string_view get_value_better(const my_map_t& input_map, std::string_view value)
{
auto retrieved = input_map.find(value.data());
if (retrieved != input_map.cend())
{
return retrieved->second;
}
return "";
}
int main()
{
my_map_t my_map = {
{"key_0", "value_0"},
{"key_1", "value_1"},
};
std::cout << (get_value_worse(my_map, "key_0") == get_value_better(my_map, "key_0")) << std::endl;
}
Under the latest gcc with no optimisations this prints 0 for false, while under -O3 this prints 1 for true.
I believe the un-optimised behaviour is because the second and third comparison operator arguments are expressions, not statements - and so the retrieved->second in retrieved != arguments.cend() ? retrieved->second : "" gets evaluated as a string construction on the stack, and returning a string_view to that is bad.
I can also see that with -O3 the compiler would be able to inline all of this, remove branching, and be done with it... but I would have expected -O3 to act exactly "as if" I had compiled with -O0.
Can anyone explain why the compiler gets to elide the copy construction I believe is happening in the -O0 version?
In the conditional expression, a temporary std::string object is constructed. Temporary object are usually constructed on the stack, although this is an implementation detail that is not important. The important thing is that the temporary object is destroyed at the end of the return statement, so the returned std::string_view is dangling. Attempting to access the data it points to (using the == operator, or otherwise) results in undefined behaviour.
When a program contains undefined behaviour, the compiler can do whatever it wants with it. In particular, the compiler is permitted to optimize by assuming that a condition that implies undefined behaviour will always be false. If it turns out that this assumption is wrong, then the compiler is off the hook (because it means undefined behaviour is occurring). What kind of assumption exactly is being made by your compiler is not clear. It also doesn't really matter, because you can't depend on the behaviour that you see now. You should just rewrite your program to remove the undefined behaviour.
I've tried to code like this several times:
struct Foo
{
double const& f;
Foo(double const& fx) : f(fx)
{
printf("%f %f\n", fx, this->f); // 125 125
}
double GetF() const
{
return f;
}
};
int main()
{
Foo p(123.0 + 2.0);
printf("%f\n", p.GetF()); // 0
return 0;
}
But it doesn't crash at all. I've also used valgrind to test the program but no error or warning occured. So, I assume that the compiler automatically generated a code directing the reference to another hidden variable. But I'm really not sure.
No, this is not safe. More precisely this is UB, means anything is possible.
When you pass 123.0 + 2.0 to the constructor of Foo, a temporary double will be constructed and bound to the parameter fx. The temporary will be destroyed after the full expression (i.e. Foo p(123.0 + 2.0);), then the reference member f will become dangled.
Note that the temporary's lifetime won't be extended to the lifetime of the reference member f.
In general, the lifetime of a temporary cannot be further extended by "passing it on": a second reference, initialized from the reference to which the temporary was bound, does not affect its lifetime.
And from the standard, [class.base.init]/8
A temporary expression bound to a reference member in a
mem-initializer is ill-formed. [ Example:
struct A {
A() : v(42) { } // error
const int& v;
};
— end example ]
But it doesn't crash at all. I've also used valgrind to test the program but no error or warning occured.
Ah, the joy of debugging undefined behaviour. It's possible that the compiler compiles invalid code to something where tools can no longer detect that it's invalid, and that's what happens here.
From the OS perspective, and from valgrind's perspective, the memory that f references is still valid, therefore it doesn't crash, and valgrind doesn't report anything wrong. The fact that you see an output value of 0 means the compiler has, in your case, re-used the memory that was formerly used for the temporary object to store some other unrelated value.
It should be clear that attempts to access that unrelated value through a reference to an already-deleted object are invalid.
Is it safe to make a const reference member to a temporary variable?
Yes, as long as the reference is used only while the lifetime of the "temporary" variable has not ended. In the code you posted, you are holding on to a reference past the lifetime of the referenced object. (i.e. not good)
So, I assume that the compiler automatically generated a code directing the reference to another hidden variable.
No, that's not quite what's happening.
On my machine your print statement in main prints 125 instead of 0, so first let's duplicate your results:
#include <alloca.h>
#include <cstring>
#include <iostream>
struct Foo
{
double const& f;
Foo(double const& fx) : f(fx)
{
std::cout << fx << " " << this->f << std::endl;
}
double GetF() const
{
return f;
}
};
Foo make_foo()
{
return Foo(123.0 + 2.0);
}
int main()
{
Foo p = make_foo();
void * const stack = alloca(1024);
std::memset(stack, 0, 1024);
std::cout << p.GetF() << std::endl;
return 0;
}
Now it prints 0!
125.0 and 2.0 are floating point literals. Their sum is a rvalue that is materialized during the construction of the Foo object, since Foo's constructor requires a reference to a double. That temporary double exists in memory on the stack.
References are usually implemented to hold the machine address of the object they reference, which means Foo's reference member is holding a stack memory address. The object that exists at that address when Foo's constructor is called, does not exist after the constructor completes.
On my machine, that stack memory is not automatically zeroed when the lifetime of the temporary ends, so in your code the reference returns the (former) object's value. In my code, when I reuse the stack memory previously occupied by the temporary (via alloca and memset), that memory is (correctly) overwritten and future uses of the reference reflect the state of the memory at the address, which no longer has any relationship to the temporary. In both cases the memory address is valid, so no segfault is triggered.
I added make_foo and used alloca and std::memset because of some compiler-specific behavior and so I could use the intuitive name "stack", but I could have just as easily done this instead which achieves similar results:
Foo p = Foo(123.0 + 2.0);
std::vector<unsigned char> v(1024, 0);
std::cout << p.GetF() << std::endl;
This is indeed unsafe (it has undefined behavior), and the asan AddressSanitizerUseAfterScope will detect this:
$ g++ -ggdb3 a.cpp -fsanitize=address -fsanitize-address-use-after-scope && ./a.out
125.000000 125.000000
=================================================================
==11748==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fff1bbfdab0 at pc 0x000000400b80 bp 0x7fff1bbfda20 sp 0x7fff1bbfda18
READ of size 8 at 0x7fff1bbfdab0 thread T0
#0 0x400b7f in Foo::GetF() const a.cpp:12
#1 0x4009ca in main a.cpp:18
#2 0x7fac0bd05d5c in __libc_start_main (/lib64/libc.so.6+0x1ed5c)
#3 0x400808 (a.out+0x400808)
Address 0x7fff1bbfdab0 is located in stack of thread T0 at offset 96 in frame
#0 0x4008e6 in main a.cpp:16
This frame has 2 object(s):
[32, 40) 'p'
[96, 104) '<unknown>' <== Memory access at offset 96 is inside this variable
In order to use AddressSanitizerUseAfterScope, you need to run Clang 5.0 or gcc 7.1.
Valgrind is good at detecting invalid use of heap memory, but because it runs on an unaltered program file it cannot in general detect stack use bugs.
Your code is unsafe because the parameter double const& fx is bound to a temporary, a materialized prvalue double with value 125.0. This temporary has lifetime terminating at the end of the statement-expression Foo p(123.0 + 2.0).
One way to make your code safe is to use aggregate lifetime extension (Extending temporary's lifetime through rvalue data-member works with aggregate, but not with constructor, why?), by removing the constructor Foo::Foo(double const&), and changing the initializer of p to use the list-initialization syntax:
Foo p{123.0 + 2.0};
// ^ ^
If the temporary variable exists at the point where the reference is used, then the behavior is well defined. And in this case this temporary variable exists exactly because it is referenced! Form C++11 standard section 12.2.5:
The temporary to which the reference is bound or the temporary that is
the complete object of a subobject to which the reference is bound
persists for the lifetime of the reference ...
Yes, the word hidden by '...' is the "except" and multiple exceptions are listed there, but none of them are applicable in this example case. So this is legal and well defined, should produce no warnings, but not very widely known corner case.
If the temporary variable exists at the point where the reference is used, then the behaviour is well defined.
If the temporary ceases to exist before the reference is used, then the behaviour of using the reference is undefined.
Unfortunately, your code is an example of the latter. The temporary which holds the result of 123.0 + 2.0 ceases to exist when the statement Foo p(123.0 + 2.0) finishes. The next statement printf("%f\n", p.GetF()) then accesses a reference to that temporary which no longer exists.
Generally speaking, undefined behaviour is considered unsafe - it means there is no requirement on what the code actually does. The result you are seeing in testing is not guaranteed.
As others say it is currently unsafe. So it should be compile-time checked. So when one stores the reference one should also forbid rvalues:
Foo(double &&)=delete;
I am not 100% that the following code is semantically correct:
#include <iostream>
#include <experimental/string_view>
int main()
{
std::string str = "lvalue string";
std::experimental::string_view view_lvalue(str);
std::experimental::string_view view_rvalue(std::string{"rvalue string"});
std::cout << view_lvalue << '\n' << view_rvalue << '\n';
}
Live on Wandbox
Question: Can I legally bind a rvalue to std::experimental::basic_string_view, or is it just UB? If yes, how does it work? As far as I know, a rvalue does not bind to a const reference (which I assume the view holds to the original string) via the constructor, so I thought that at the end of the statement std::experimental::string_view view_rvalue(std::string{"rvalue string"}); the reference will be dangling. Does string_view use a more sophisticated approach?
I am asking this because I am trying to write a similar view for some matrix class, and don't yet know how to deal with rvalues (I can disable them of course, but I don't think it's the best approach).
If cpprefernce is correct then this is UB. std::string_view has
A typical implementation holds only two members: a pointer to constant CharT and a size.
And the constructor has
Constructs a view of the first str.size() characters of the character array starting with the element pointed by str.data().
So if string_view just points to the underlying char array of the provided string then we will have a dangling pointer once the expression ends and the temporary is destroyed.
As pointed out in the comments one reason this behavior may have been allowed is so you can pass a string_view to a function and construct that string_view from a temporary string
According to Herb Sutter's article http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/, the following code is correct:
#include <iostream>
#include <vector>
using namespace std;
vector<vector<int>> f() { return {{1},{2},{3},{4},{5}}; }
int main()
{
const auto& v = f();
cout << v[3][0] << endl;
}
i.e. the lifetime of v is extended to the lifetime of the v const reference.
And indeed this compiles fine with gcc and clang and runs without leaks according to valgrind.
However, when I change the main function thusly:
int main()
{
const auto& v = f()[3];
cout << v[0] << endl;
}
it still compiles but valgrind warns me of invalid reads in the second line of the function due to the fact that the memory was free'd in the first line.
Is this standard compliant behaviour or could this be a bug in both g++ (4.7.2) and clang (3.5.0-1~exp1)?
If it is standard compliant, it seems pretty weird to me... oh well.
There's no bug here except in your code.
The first example works because, when you bind the result of f() to v, you extend the lifetime of that result.
In the second example you don't bind the result of f() to anything, so its lifetime is not extended. Binding to a subobject of it would count:
[C++11: 12.2/5]: The second context is when a reference is bound to a temporary. The temporary to which the reference is bound or the temporary that is the complete object of a subobject to which the reference is bound persists for the lifetime of the reference except: [..]
…but you're not doing that: you're binding to the result of calling a member function (e.g. operator[]) on the object, and that result is not a data member of the vector!
(Notably, if you had an std::array rather than an std::vector, then the code† would be absolutely fine as array data is stored locally, so elements are subobjects.)
So, you have a dangling reference to a logical element of the original result of f() which has long gone out of scope.
† Sorry for the horrid initializers but, well, blame C++.
This question already has answers here:
Lifetime of temporaries
(2 answers)
Closed 9 years ago.
I have tested this code:
#include <iostream>
#include <cstdio>
#include <string>
using namespace std;
int main()
{
string s1("a"),s2("b");
const char * s = (s1+s2).c_str();
printf("%s\n",s);
}
It returns "ab".
As far as I know, since (s1 +s2) is a temporary object and may disappear somehow (I have no idea about that), then const char * s may point to undefined memory and may get dumped.
So is it safe to use the .c_str() like that?
It's not safe in your example. It's safe however in
printf("%s\n", (a + b).c_str());
The reason is that temporary values (like the result of a + b) are destroyed at the end of the full expression. In your example the const char * survives the full expression containing the temporary and dereferencing it is undefined behaviour.
The worst part of "undefined behaviour" is that things may apparently work anyway... (UB code crashes only if you're making your demo in front of a vast audience that includes your parents ;-) )
In that example we can just quote the standard:
12.2 Temporary objects [class.temporary]
Temporary objects are destroyed as the last step in evaluating the full-expression (1.9) that (lexically) contains the point where they were created. This is true even if that evaluation ends in throwing an exception. The value computations and side effects of destroying a temporary object are associated only with the full-expression, not with any specific subexpression.
That is after the semicolon of your line:
const char * s = (s1+s2).c_str(); // <- Here
So here:
printf("%s\n",s); // This line will now cause undefined behaviour.
Why? Because as your object is destructed, you don't know anymore what is at this place now...
The bad thing here is that, with Undefined behaviour, your program may seem to work at the first time, but... It will crash for sure at the worst time...
You can do:
printf( "%s\n", (s1+s2).c_str() );
It will work because the object is not destructed yet (remember, after the semicolon...).
It's not safe, but you can easily assign to a new variable, and the pointer will be safe in the scope of that variable:
string s1("a"), s2("b") , s3;
s3 = s1 + s2;
printf("%s\n", s3.c_str());
//other operations with s3
Like most programming constructs, it's "safe" if you use it correctly, and it's not "safe" if you're sloppy. In this case, using it correctly means paying attention to object lifetimes. The + operator creates a temporary object which gets destroyed at the end of the statement, and the returned const char* is no longer valid after the statement that created it. So you can pass the result of c_str() directly to a function, but you can't save the pointer and use it later.