C++ - "Most important const" doesn't work with expressions? - c++

According to Herb Sutter's article http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/, the following code is correct:
#include <iostream>
#include <vector>
using namespace std;
vector<vector<int>> f() { return {{1},{2},{3},{4},{5}}; }
int main()
{
const auto& v = f();
cout << v[3][0] << endl;
}
i.e. the lifetime of v is extended to the lifetime of the v const reference.
And indeed this compiles fine with gcc and clang and runs without leaks according to valgrind.
However, when I change the main function thusly:
int main()
{
const auto& v = f()[3];
cout << v[0] << endl;
}
it still compiles but valgrind warns me of invalid reads in the second line of the function due to the fact that the memory was free'd in the first line.
Is this standard compliant behaviour or could this be a bug in both g++ (4.7.2) and clang (3.5.0-1~exp1)?
If it is standard compliant, it seems pretty weird to me... oh well.

There's no bug here except in your code.
The first example works because, when you bind the result of f() to v, you extend the lifetime of that result.
In the second example you don't bind the result of f() to anything, so its lifetime is not extended. Binding to a subobject of it would count:
[C++11: 12.2/5]: The second context is when a reference is bound to a temporary. The temporary to which the reference is bound or the temporary that is the complete object of a subobject to which the reference is bound persists for the lifetime of the reference except: [..]
…but you're not doing that: you're binding to the result of calling a member function (e.g. operator[]) on the object, and that result is not a data member of the vector!
(Notably, if you had an std::array rather than an std::vector, then the code† would be absolutely fine as array data is stored locally, so elements are subobjects.)
So, you have a dangling reference to a logical element of the original result of f() which has long gone out of scope.
† Sorry for the horrid initializers but, well, blame C++.

Related

Apparently you can modify const values w/o UB. Or can you?

---- Begin Edit ----
User #user17732522 pointed out the flaw that invokes UB is from the fact pop_back() invalidates the references used according to the vector library documentation. And constexpr evaluation is not required to detect this when it occurs as it's not part of the C++ core.
However, the fix, which was also pointed out by #user17732522, is simple. Replace occurrences of these two consecutive lines of code:
v.pop_back();
v.emplace_back(...);
with these two lines:
std::destroy_at(&v[0]); // optional since A has a trivial destructor
std::construct_at<A, int>(&v[0], ...);
---- Begin Original ----
While references are invalidated upon the destroy_at, they are reified automatically by the construct_at. See: https://eel.is/c++draft/basic#life-8,
It's well established you can't modify const values unless they were originally non const. But there appears an exception. Vectors containing objects with const members.
Here's how:
#include <vector>
#include <iostream>
struct A {
constexpr A(int arg) : i{ arg } {}
const int i;
};
int main()
{
std::vector<A> v;
v.emplace_back(1); // vector of one A initialized to i:1
A& a = v[0];
// prints: 1 1
std::cout << v[0].i << " " << a.i << '\n';
//v.resize(0); // ending the lifetime of A and but now using the same storage
v.pop_back();
v.emplace_back(2); // vector of one A initialized to i:2
// prints: 2 2
std::cout << v[0].i << " " << a.i << '\n';
}
Now this seems to violate the general rule that you can't change the const values. But using a consteval to force the compiler to flag UB, we can see that it is not UB
consteval int foo()
{
std::vector<A> v;
v.emplace_back(1); // vector of one A initialized to i:1
A& a = v[0];
v.pop_back();
v.emplace_back(2);
return a.i;
}
// verification the technique doesn't produce UB
constexpr int c = foo();
So either this is an example of modifying a const member inside a vector w/o UB or the UB detection using consteval is flawed. Which is it or am I missing something else?
It is UB to modify a const object.
However it is generally not UB to place a new object into storage previously occupied by a const object. That is UB only if the storage was previously occupied by a const complete object (a member subobject is not a complete object) and also doesn't apply to dynamic storage, which a std::vector will likely use to store elements.
std::vector is specified to allow creating new objects in it after removing previous ones, no matter the type, so it must implement in some way that works in any case.
What you are doing has undefined behavior for a different reason. You are taking a reference to the vector element at v[0]; and then you pop that element from the vector. std::vector's specification says that this invalidates the reference. Consequently, reading from the reference afterwards with a.i has undefined behavior. (But not via v[0]).
So your code has undefined behavior (since you are doing this in both examples).
However, it is unspecified whether UB in standard library clauses of the standard such as using the invalidated reference needs to be diagnosed in constant expression evaluation. Only core language undefined behavior needs to be diagnosed (and even then there are exceptions that obviously shouldn't be required to be diagnosed, since it is impossible, e.g. order-of-evaluation undefined behavior). Therefore the compiler does not need to give you an error for the UB here.
Also, consteval is not required here. constexpr would have done the same since you already force constant evaluation with constexpr on the variable.

Is it safe to make a const reference member to a temporary variable?

I've tried to code like this several times:
struct Foo
{
double const& f;
Foo(double const& fx) : f(fx)
{
printf("%f %f\n", fx, this->f); // 125 125
}
double GetF() const
{
return f;
}
};
int main()
{
Foo p(123.0 + 2.0);
printf("%f\n", p.GetF()); // 0
return 0;
}
But it doesn't crash at all. I've also used valgrind to test the program but no error or warning occured. So, I assume that the compiler automatically generated a code directing the reference to another hidden variable. But I'm really not sure.
No, this is not safe. More precisely this is UB, means anything is possible.
When you pass 123.0 + 2.0 to the constructor of Foo, a temporary double will be constructed and bound to the parameter fx. The temporary will be destroyed after the full expression (i.e. Foo p(123.0 + 2.0);), then the reference member f will become dangled.
Note that the temporary's lifetime won't be extended to the lifetime of the reference member f.
In general, the lifetime of a temporary cannot be further extended by "passing it on": a second reference, initialized from the reference to which the temporary was bound, does not affect its lifetime.
And from the standard, [class.base.init]/8
A temporary expression bound to a reference member in a
mem-initializer is ill-formed. [ Example:
struct A {
A() : v(42) { } // error
const int& v;
};
— end example ]
But it doesn't crash at all. I've also used valgrind to test the program but no error or warning occured.
Ah, the joy of debugging undefined behaviour. It's possible that the compiler compiles invalid code to something where tools can no longer detect that it's invalid, and that's what happens here.
From the OS perspective, and from valgrind's perspective, the memory that f references is still valid, therefore it doesn't crash, and valgrind doesn't report anything wrong. The fact that you see an output value of 0 means the compiler has, in your case, re-used the memory that was formerly used for the temporary object to store some other unrelated value.
It should be clear that attempts to access that unrelated value through a reference to an already-deleted object are invalid.
Is it safe to make a const reference member to a temporary variable?
Yes, as long as the reference is used only while the lifetime of the "temporary" variable has not ended. In the code you posted, you are holding on to a reference past the lifetime of the referenced object. (i.e. not good)
So, I assume that the compiler automatically generated a code directing the reference to another hidden variable.
No, that's not quite what's happening.
On my machine your print statement in main prints 125 instead of 0, so first let's duplicate your results:
#include <alloca.h>
#include <cstring>
#include <iostream>
struct Foo
{
double const& f;
Foo(double const& fx) : f(fx)
{
std::cout << fx << " " << this->f << std::endl;
}
double GetF() const
{
return f;
}
};
Foo make_foo()
{
return Foo(123.0 + 2.0);
}
int main()
{
Foo p = make_foo();
void * const stack = alloca(1024);
std::memset(stack, 0, 1024);
std::cout << p.GetF() << std::endl;
return 0;
}
Now it prints 0!
125.0 and 2.0 are floating point literals. Their sum is a rvalue that is materialized during the construction of the Foo object, since Foo's constructor requires a reference to a double. That temporary double exists in memory on the stack.
References are usually implemented to hold the machine address of the object they reference, which means Foo's reference member is holding a stack memory address. The object that exists at that address when Foo's constructor is called, does not exist after the constructor completes.
On my machine, that stack memory is not automatically zeroed when the lifetime of the temporary ends, so in your code the reference returns the (former) object's value. In my code, when I reuse the stack memory previously occupied by the temporary (via alloca and memset), that memory is (correctly) overwritten and future uses of the reference reflect the state of the memory at the address, which no longer has any relationship to the temporary. In both cases the memory address is valid, so no segfault is triggered.
I added make_foo and used alloca and std::memset because of some compiler-specific behavior and so I could use the intuitive name "stack", but I could have just as easily done this instead which achieves similar results:
Foo p = Foo(123.0 + 2.0);
std::vector<unsigned char> v(1024, 0);
std::cout << p.GetF() << std::endl;
This is indeed unsafe (it has undefined behavior), and the asan AddressSanitizerUseAfterScope will detect this:
$ g++ -ggdb3 a.cpp -fsanitize=address -fsanitize-address-use-after-scope && ./a.out
125.000000 125.000000
=================================================================
==11748==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fff1bbfdab0 at pc 0x000000400b80 bp 0x7fff1bbfda20 sp 0x7fff1bbfda18
READ of size 8 at 0x7fff1bbfdab0 thread T0
#0 0x400b7f in Foo::GetF() const a.cpp:12
#1 0x4009ca in main a.cpp:18
#2 0x7fac0bd05d5c in __libc_start_main (/lib64/libc.so.6+0x1ed5c)
#3 0x400808 (a.out+0x400808)
Address 0x7fff1bbfdab0 is located in stack of thread T0 at offset 96 in frame
#0 0x4008e6 in main a.cpp:16
This frame has 2 object(s):
[32, 40) 'p'
[96, 104) '<unknown>' <== Memory access at offset 96 is inside this variable
In order to use AddressSanitizerUseAfterScope, you need to run Clang 5.0 or gcc 7.1.
Valgrind is good at detecting invalid use of heap memory, but because it runs on an unaltered program file it cannot in general detect stack use bugs.
Your code is unsafe because the parameter double const& fx is bound to a temporary, a materialized prvalue double with value 125.0. This temporary has lifetime terminating at the end of the statement-expression Foo p(123.0 + 2.0).
One way to make your code safe is to use aggregate lifetime extension (Extending temporary's lifetime through rvalue data-member works with aggregate, but not with constructor, why?), by removing the constructor Foo::Foo(double const&), and changing the initializer of p to use the list-initialization syntax:
Foo p{123.0 + 2.0};
// ^ ^
If the temporary variable exists at the point where the reference is used, then the behavior is well defined. And in this case this temporary variable exists exactly because it is referenced! Form C++11 standard section 12.2.5:
The temporary to which the reference is bound or the temporary that is
the complete object of a subobject to which the reference is bound
persists for the lifetime of the reference ...
Yes, the word hidden by '...' is the "except" and multiple exceptions are listed there, but none of them are applicable in this example case. So this is legal and well defined, should produce no warnings, but not very widely known corner case.
If the temporary variable exists at the point where the reference is used, then the behaviour is well defined.
If the temporary ceases to exist before the reference is used, then the behaviour of using the reference is undefined.
Unfortunately, your code is an example of the latter. The temporary which holds the result of 123.0 + 2.0 ceases to exist when the statement Foo p(123.0 + 2.0) finishes. The next statement printf("%f\n", p.GetF()) then accesses a reference to that temporary which no longer exists.
Generally speaking, undefined behaviour is considered unsafe - it means there is no requirement on what the code actually does. The result you are seeing in testing is not guaranteed.
As others say it is currently unsafe. So it should be compile-time checked. So when one stores the reference one should also forbid rvalues:
Foo(double &&)=delete;

Using of temporary lifetime difference

We are in a hot discussion with my friends about the code:
#include <iostream>
#include <string>
using namespace std;
string getString() {
return string("Hello, world!");
}
int main() {
char const * str = getString().c_str();
std::cout << str << "\n";
return 0;
}
This code produces different outputs on g++, clang and vc++:
g++ and clang output is the same:
Hello, world!
However vc++ outputs nothing (or just spaces):
What behavior is correct? Is this may be a change in standard according to temporaries lifetime ?
As far as I can see by reading IR of clang++, it works as following:
store `getString()`'s return value in %1
std::cout << %1.c_str() << "\n";
destruct %1
Personally, I think gcc works this way too (I've tested it with rvo/move verbosity (custom ctors and dtors which prints to std::cout). Why does vc++ works other way?
clang = Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
g++ = gcc version 4.9.2 (Debian 4.9.2-10)
Your program has undefined behaviour! You are "printing" a dangling pointer.
The result of getString(), a temporary string, lives no longer than that const char* declaration; accordingly neither does the result of invoking c_str() on that temporary.
So both compilers are "correct"; it is you and your friends who are wrong.
This is why we shall not store the result of std::string::c_str(), unless we really, really need to.
Both are right, undefined behaviour is undefined.
char const * str = getString().c_str();
getString() returns a temporary, which will be destroyed at the end of the full expression which contains it. So after that line is finished, str is an invalid pointer and trying to inspect it will plunge you into the land of undefined behaviour.
Some standards quotes, as requested (from N4140):
[class.temporary]/3: Temporary objects are destroyed as the last step in evaluating the full-expression that (lexically) contains the point where they were created.
basic_string::c_str is specified like so:
[string.accessors]/1: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
Since strings have their contents stored contiguously ([string.require]/4) this essentially means "return a pointer to the start of the buffer".
Obviously when a std::string is destructed it will reclaim any memory which was allocated, making that pointer invalid (if your friends don't believe that, they have other problems).
That is undefined behavior so anything can happen (including printing the string "correctly").
Making things "working" anyway happens quite often with UB, unless the program is actually running on a paying customer's computer or if it's shown on the big screen in front of a vast audience ;-)
The problem is that you're taking a const char * pointing inside a temporary object that is destroyed before your use of the pointer.
Note that this is not the same situation as with:
const std::string& str = getString(); // Returns a temporary
std::cout << str << "\n";
because in this case instead there is a very specific rule about references bound to temporaries in the C++ standard. In this case the lifetime of the temporary will be extended until the reference str is also destroyed. The rule only applies to references and only if directly bound to the temporary or to a sub-object of the temporary (like const std::string& s = getObj().s;) and not to the result of calling methods of a temporary object.

Is it valid C++ to cast an rvalue to a const pointer?

In a moment of haste, needing a pointer to an object to pass to a function. I took the address of an unnamed temporary object and to my surprise it compiled (the original code had warnings turned further down and lacked the const correctness present in the example below). Curious, I set up a controlled environment with warnings all the way up and treating warnings as errors in Visual Studio 2013.
Consider the following code:
class Contrived {
int something;
};
int main() {
const Contrived &r = Contrived(); // this is well defined even in C++03, the object lives until r goes out of scope
const Contrived *p1 = &r; // compiles fine, given the type of r this should be fine. But is it considering r was initialized with an rvalue?
const Contrived *p2 = &(const Contrived&)Contrived(); // this is handy when calling functions, is it valid? It also compiles
const int *p3 = &(const int&)27; // it works with PODs too, is it valid C++?
return 0;
}
The three pointer initializations are all more or less the same thing. The question is, are these initializations valid C++ under C++03, C++11, or both? I ask about C++11 separately in case something changed, considering that a lot of work was put in around rvalue references. It may not seem worthwhile to assign these values such as in the above example, but it's worth noting this could save some typing if such values are being passed to a function taking constant pointers and you don't have an appropriate object lying around or feel like making a temporary object on a line above.
EDIT:
Based on the answers the above is valid C++03 and C++11. I'd like to call out some additional points of clarification with regard to the resulting objects' lifetimes.
Consider the following code:
class Contrived {
int something;
} globalClass;
int globalPOD = 0;
template <typename T>
void SetGlobal(const T *p, T &global) {
global = *p;
}
int main() {
const int *p1 = &(const int&)27;
SetGlobal<int>(p1, globalPOD); // does *p still exist at the point of this call?
SetGlobal<int>(&(const int&)27, globalPOD); // since the rvalue expression is cast to a reference at the call site does *p exist within SetGlobal
// or similarly with a class
const Contrived *p2 = &(const Contrived&)Contrived();
SetGlobal<Contrived>(p2, globalClass);
SetGlobal<Contrived>(&(const Contrived&)Contrived(), globalClass);
return 0;
}
The question is are either or both of the calls to SetGlobal valid, in that they are passing a pointer to an object that will exist for the duration of the call under the C++03 or C++11 standard?
An rvalue is a type of expression, not a type of object. We're talking about the temporary object created by Contrived(), it doesn't make sense to say "this object is an rvalue". The expression that created the object is an rvalue expression, but that's different.
Even though the object in question is a temporary object, its lifetime has been extended. It's perfectly fine to perform operations on the object using the identifier r which denotes it. The expression r is an lvalue.
p1 is OK. On the p2 and p3 lines, the lifetime of the reference ends at the end of that full-expression, so the temporary object's lifetime also ends at that point. So it would be undefined behaviour to use p2 or p3 on subsequent lines. The initializing expression could be used as an argument to a function call though, if that's what you meant.
The first one is good: the expression r is not in fact an rvalue.
The other two are technically valid, too, but be aware that pointers become dangling at the end of the full expression (at the semicolon), and any attempt to use them would exhibit undefined behavior.
While it is perfectly legal to pass an rvalue by const&, you have to be aware that your code ends up with invalidated pointers in p2 and p3, since the lifetime of the objects that they point is over.
To exemplify this, consider the following code that is often used to pass a temporary by reference:
template<typename T>
void pass_by_ref(T const&);
A function like this can be called with an lvalue or rvalue as its argument (and often is). Inside that function you can obviously take the reference of your argument - it is just a reference to a const object after all... You are basically doing the exact same thing without the help of a function.
In fact, in C++11, you can go one step further and obtain a non-const pointer to an temporary:
template<typename T>
typename std::remove_reference<T>::type* example(T&& t)
{
return &t;
}
Note that the object the return value points to will only still exist if this function is called with an lvalue (since its argument will turn out to be typename remove_reference<T>::type& && which is typename remove_reference<T>::type&).

Global const string& smells bad to me, is it truly safe?

I'm reviewing a collegue's code, and I see he has several constants defined in the global scope as:
const string& SomeConstant = "This is some constant text";
Personally, this smells bad to me because the reference is referring to what I'm assuming is an "anonymous" object constructed from the given char array.
Syntactically, it's legal (at least in VC++ 7), and it seems to run, but really I'd rather have him remove the & so there's no ambiguity as to what it's doing.
So, is this TRULY safe and legal and I'm obsessing? Does the temp object being constructed have a guaranteed lifetime? I had always assumed anonymous objects used in this manner were destructed after use...
So my question could also be generalized to anonymous object lifetime. Does the standard dictate the lifetime of an anonymous object? Would it have the same lifetime as any other object in that same scope? Or is it only given the lifetime of the expression?
Also, when doing it as a local, it's obviously scoped differently:
class A
{
string _str;
public:
A(const string& str) :
_str(str)
{
cout << "Constructing A(" << _str << ")" << endl;
}
~A()
{
cout << "Destructing A(" << _str << ")" << endl;
}
};
void TestFun()
{
A("Outer");
cout << "Hi" << endl;
}
Shows:
Constructing A(Outer);
Destructing A(Outer);
Hi
It's completely legal. It will not be destructed until the program ends.
EDIT: Yes, it's guaranteed:
"All objects which do not have dynamic
storage duration, do not have thread
storage duration, and are not local
have static storage duration. The
storage for these objects shall last
for the duration of the program
(3.6.2, 3.6.3)."
-- 2008 Working Draft, Standard for Programming Language C++, § 3.7.1 p. 63
As Martin noted, this is not the whole answer. The standard draft further notes (§ 12.2, p. 250-1):
"Temporaries of class type are created
in various contexts: binding an rvalue
to a reference (8.5.3) [...] Even when
the creation of the temporary object
is avoided (12.8), all the semantic
restrictions shall be respected as if
the temporary object had been created.
[...] Temporary objects are destroyed
as the last step in evaluating the
full-expression (1.9) that (lexically)
contains the point where they were
created. [...] There are two contexts
in which temporaries are destroyed at
a different point than the end of the
full-expression. [...] The second
context is when a reference is bound
to a temporary. The temporary to which
the reference is bound or the
temporary that is the complete object
of a subobject to which the reference
is bound persists for the lifetime of
the reference except as specified
below."
I tested in g++ if that makes you feel any better. ;)
Yes it is valid and legal.
const string& SomeConstant = "This is some constant text";
// Is equivalent too:
const string& SomeConstant = std::string("This is some constant text");
Thus you are creating a temporary object.
This temporary object is bound to a const& and thus has its lifetime extended to the lifespan of the variable it is bound too (ie longer than the expression in which it was created).
This is guranteed by the standard.
Note:
Though it is legal. I would not use it. The easist solution would be to convert it into a const std::string.
Usage:
In this situation because the variable is in global scope it is valid for the full length of the program. So it can be used as soon as execution enters main() and should not be accessed after executiuon exits main().
Though it technically may be avilable before this your usage of it in constructors/destructors of global objects should be tempered with the known problem of global variable initialization order.
Extra Thoughts:
This on the other hand will not suffer from the problem:
char const* SomeConstant = "This is some constant text";
And can be used at any point. Just a thought.
It might be legal, but still ugly. Leave out the reference !
const string SomeConstant = "This is some constant text";
It's as legal as it's ugly.
It's legal to extend a temporary variable with a const reference, this is used by Alexandrescu's ScopeGaurd see this excellent explanation by Herb Sutter called A candidate for the "Most important const".
That being said this specific case is an abuse of this feature of C++ and the reference should be removed leaving a plain const string.
By declaring it as const (which means it can't be changed) and then making it a reference, which implies that someone might change it, seems like bad form, at the very least. Plus, as I am sure you understand, global variables are BAD, and rarely necessary.
Okay, folks correct me if I'm off the deep end, but here's my conclusions listening to all of your excellent responses:
A) it is syntactically and logically legal, the & extends the lifetime of the temp/anonymous from beyond expression level to the life of the reference. I verified this in VC++7 with:
class A {
public: A() { cout << "constructing A" << endl; }
public: ~A() { cout << "destructing A" << endl; }
};
void Foo()
{
A();
cout << "Foo" << endl;
}
void Bar()
{
const A& someA = A();
cout << "Bar" << endl;
}
int main()
{
Foo(); // outputs constructing A, destructing A, Foo
Bar(); // outputs constructing A, Bar, destructing A
return 0;
}
B) Though it is legal, it can lead to some confusion as to the actual lifetime and the reference in these cases give you no benefit of declaring it as a non-reference, thus the reference should probably be avoided and may even be extra space. Since there's no benefit to it, it's unnecessary obfuscation.
Thanks for all the answers it was a very interesting dicussion. So the long and short of it: Yes, it's syntactically legal, no it's not technically dangerous as the lifetime is extended, but it adds nothing and may add cost and confusion, so why bother.
Sound right?