Why are references forbidden in std::variant? - c++

I use boost::variant a lot and am quite familiar with it. boost::variant does not restrict the bounded types in any way, in particular, they may be references:
#include <boost/variant.hpp>
#include <cassert>
int main() {
int x = 3;
boost::variant<int&, char&> v(x); // v can hold references
boost::get<int>(v) = 4; // manipulate x through v
assert(x == 4);
}
I have a real use-case for using a variant of references as a view of some other data.
I was then surprised to find, that std::variant does not allow references as bounded types, std::variant<int&, char&> does not compile and it says here explicitly:
A variant is not permitted to hold references, arrays, or the type void.
I wonder why this is not allowed, I don't see a technical reason. I know that the implementations of std::variant and boost::variant are different, so maybe it has to do with that? Or did the authors think it is unsafe?
PS: I cannot really work around the limitation of std::variant using std::reference_wrapper, because the reference wrapper does not allow assignment from the base type.
#include <variant>
#include <cassert>
#include <functional>
int main() {
using int_ref = std::reference_wrapper<int>;
int x = 3;
std::variant<int_ref> v(std::ref(x)); // v can hold references
static_cast<int&>(std::get<int_ref>(v)) = 4; // manipulate x through v, extra cast needed
assert(x == 4);
}

Fundamentally, the reason that optional and variant don't allow reference types is that there's disagreement on what assignment (and, to a lesser extent, comparison) should do for such cases. optional is easier than variant to show in examples, so I'll stick with that:
int i = 4, j = 5;
std::optional<int&> o = i;
o = j; // (*)
The marked line can be interpreted to either:
Rebind o, such that &*o == &j. As a result of this line, the values of i and j themselves remain changed.
Assign through o, such &*o == &i is still true but now i == 5.
Disallow assignment entirely.
Assign-through is the behavior you get by just pushing = through to T's =, rebind is a more sound implementation and is what you really want (see also this question, as well as a Matt Calabrese talk on Reference Types).
A different way of explaining the difference between (1) and (2) is how we might implement both externally:
// rebind
o.emplace(j);
// assign through
if (o) {
*o = j;
} else {
o.emplace(j);
}
The Boost.Optional documentation provides this rationale:
Rebinding semantics for the assignment of initialized optional references has been chosen to provide consistency among initialization states even at the expense of lack of consistency with the semantics of bare C++ references. It is true that optional<U> strives to behave as much as possible as U does whenever it is initialized; but in the case when U is T&, doing so would result in inconsistent behavior w.r.t to the lvalue initialization state.
Imagine optional<T&> forwarding assignment to the referenced object (thus changing the referenced object value but not rebinding), and consider the following code:
optional<int&> a = get();
int x = 1 ;
int& rx = x ;
optional<int&> b(rx);
a = b ;
What does the assignment do?
If a is uninitialized, the answer is clear: it binds to x (we now have another reference to x). But what if a is already initialized? it would change the value of the referenced object (whatever that is); which is inconsistent with the other possible case.
If optional<T&> would assign just like T& does, you would never be able to use Optional's assignment without explicitly handling the previous initialization state unless your code is capable of functioning whether after the assignment, a aliases the same object as b or not.
That is, you would have to discriminate in order to be consistent.
If in your code rebinding to another object is not an option, then it is very likely that binding for the first time isn't either. In such case, assignment to an uninitialized optional<T&> shall be prohibited. It is quite possible that in such a scenario it is a precondition that the lvalue must be already initialized. If it isn't, then binding for the first time is OK while rebinding is not which is IMO very unlikely. In such a scenario, you can assign the value itself directly, as in:
assert(!!opt);
*opt=value;
Lack of agreement on what that line should do meant it was easier to just disallow references entirely, so that most of the value of optional and variant can at least make it for C++17 and start being useful. References could always be added later - or so the argument went.

The fundamental reason is that a reference must be assigned to something.
Unions naturally do not - can not, even - set all their fields simultaneously and therefore simply cannot contain references, from the C++ standard:
If a union contains a non-static data member of reference type the
program is ill-formed.
std::variant is a union with extra data denoting the type currently assigned to the union, so the above statement implicitly holds true for std:variant as well. Even if it were to be implemented as a straight class rather than a union, we'd be back to square one and have an uninitialised reference when a different field was in use.
Of course we can get around this by faking references using pointers, but this is what std::reference_wrapper takes care of.

Related

Rationale for const ref binding to a different type?

I recently learned that it's possible to assign a value to a reference of a different type. Concrete example:
const std::optional<float>& ref0 = 5.0f;
const std::optional<float>& ref1 = get_float();
That's surprising to me. I would certainly expect this to work with a non-reference, but assumed that references only bind to the same type.
I found a pretty good chunk of the c++ standard which talks about all kinds of ways this works: https://eel.is/c++draft/dcl.init.ref#5. But I would appreciate some insight: When is this ever desirable?
A particular occasion where this hurt me recently was this:
auto get_value() -> std::optional<float>{ /* ... */ }
const std::optional<float>& value = get_value();
// check and use value...
I later then changed the return value of the function to a raw float, expecting all uses with a reference type to fail. They did not. Without paying attention, all the useless checking code would have stayed in place.
The basic reason is one of consistency. Since const-reference parameters are very widely used not for reference semantics but merely to avoid copying, one would expect each of
void y(X);
void z(const X&);
to accept anything, rvalue or otherwise, that can be converted to an X. Initializing a local variable has the same semantics.
This syntax also once had a practical value: in C++03, the results of functions (including conversions) were notionally copied:
struct A {A(int);};
struct B {operator A() const;};
void g() {
A o=B(); // return value copied into o
const A &r=3; // refers to (lifetime-extended) temporary
}
There was already permission to elide these copies, and in this sort of trivial case it was common to do so, but the reference guaranteed it.

Why GCC rejects std::optional for references?

std::optional<int&> xx; just doesn't compile for the latest gcc-7.0.0 snapshot. Does the C++17 standard include std::optional for references? And why if it doesn't? (The implementation with pointers in a dedicated specialization whould cause no problems i guess.)
Because optional, as standardized in C++17, does not permit reference types. This was excluded by design.
There are two reasons for this. The first is that, structurally speaking, an optional<T&> is equivalent to a T*. They may have different interfaces, but they do the same thing.
The second thing is that there was effectively no consensus by the standards committee on questions of exactly how optional<T&> should behave.
Consider the following:
optional<T&> ot = ...;
T t = ...;
ot = t;
What should that last line do? Is it taking the object being referenced by ot and copy-assign to it, such that *ot == t? Or should it rebind the stored reference itself, such that ot.get() == &t? Worse, will it do different things based on whether ot was engaged or not before the assignment?
Some people will expect it to do one thing, and some people will expect it to do the other. So no matter which side you pick, somebody is going to be confused.
If you had used a T* instead, it would be quite clear which happens:
T* pt = ...;
T t = ...;
pt = t; //Compile error. Be more specific.
*pt = t; //Assign to pointed-to object.
pt = &t; //Change pointer.
In [optional]:
A program that necessitates the instantiation of template optional for a reference type, or for possibly cv-qualified types in_place_t or nullopt_t is ill-formed.
There is no std::optional<T&>. For now, you'll have to use std::optional<std::reference_wrapper<T>>.

What is the rationale for extending the lifetime of temporaries?

In C++, the lifetime of a temporary value can be extended by binding it to a reference:
Foo make_foo();
{
Foo const & r1 = make_foo();
Foo && r2 = make_foo();
// ...
} // both objects are destroyed here
Why is this allowed? What problem does this solve?
I couldn't find an explanation for this in Design and Evolution (e.g. 6.3.2: Lifetime of Temporaries). Nor could I find any previous questions about this (this one came closest).
This feature is somewhat unintuitive and has subtle failure modes. For example:
Foo const & id(Foo const & x) { return x; } // looks like a fine function...
Foo const & r3 = id(make_foo()); // ... but causes a terrible error!
Why is something that can be so easily and silently abused part of the language?
Update: the point may be subtle enough to warrant some clarification: I do not dispute the use of the rule that "references bind to temporaries". That is all fine and well, and allows us to use implicit con­ver­sions when binding to references. What I am asking about is why the lifetime of the temporary is affected. To play the devil's advocate, I could claim that the existing rules of "lifetime until end of full expression" already cover the common use cases of calling functions with temporary arguments.
The simple answer is that you need to be able to bind a temporary with a const reference, not having that feature would require a good amount of code duplication, with functions taking const& for lvalue or value arguments or by-value for rvalue arguments.
Once you need that the language needs to define some semantics that will guarantee the lifetime of the temporary is at least as long as that of the reference.
Once you accept that a reference can bind to an rvalue in one context, just for consistency you may want to extend the rule to allow the same binding in other contexts, and the semantics are really the same. The temporary lifetime is extended until the reference goes away (be it a function parameter, or a local variable).
The alternative would be rules that allow binding in some contexts (function call) but not all (local reference) or rules that allow both and always create a dangling reference in the latter case.
Removed the quote from the answer, left here so that comments would still make sense:
If you look at the wording in the standard there are some hints as of this intended usage:
12.2/5 [middle of the paragraph]
[...] A temporary bound to a reference parameter in a function call (5.2.2) persists until the completion of the full expression containing the call. [...]
As Bjarne Stroustrup (the original designer) explained it in a clc++ posting in 2005, it was for uniform rules.
The rules for references are simply the most general and uniform I
could find. In the cases of arguments and local references, the
temporary lives as long as the reference to which it is bound. One
obvious use is as a shorthand for a complicated expression in a
deeply nested loop. For example:
for (int i = 0; i<xmax; ++i)
for (int j = 0; j< ymax; ++j) {
double& r = a[i][j];
for (int k = 0; k < zmax; ++k) {
// do something with a[i][j] and a[i][j][k]
}
}
This can improve readability as well as run-time performance.
And it turned out to be useful for storing an object of a class derived from the reference type, e.g. as in the original Scopeguard implementation.
In a clc++ posting in 2008, James Kanze supplied some more details:
The standard says exactly when the destructor must be called. Before
the standard, however, the ARM (and earlier language specifications)
were considerably looser: the destructor could be called anytime after
the temporary was "used" and before the next closing brace.
(The “ARM” is the Annotated Reference Manual by (IIRC) Bjarne Stroustrup and Margareth Ellis, which served as a de-facto standard in the last decade before the first ISO standard. Unfortunately my copy is buried in a box, under a lot of other boxes, in the outhouse. So I can't verify, but I believe this is correct.)
Thus, as with much else the details of lifetime extensions were honed and perfected in the standardization process.
Since James has raised this point in comments to this answer: that perfection could not reach back in time to affect Bjarne's rationale for the lifetime extension.
Example of Scopeguard-like code, where the temporary bound to the reference is the full object of derived type, with its derived type destructor executed at the end:
struct Base {};
template< class T >
struct Derived: Base {};
template< class T >
auto foo( T ) -> Derived<T> { return Derived<T>(); }
int main()
{
Base const& guard = foo( 42 );
}
I discovered an interesting application for lifetime extension somewhere here on SO. (I forget where, I'll add a reference when I find it.)
Lifetime extension allows us to use prvalues of immobile types.
For example:
struct Foo
{
Foo(int, bool, char);
Foo(Foo &&) = delete;
};
The type Foo cannot be copied nor moved. Yet, we can have a function that returns a prvalue of type Foo:
Foo make_foo()
{
return {10, false, 'x'};
}
Yet we cannot construct a local variable initialized with the return value of make_foo, so in general, calling the function will create a temporary object that is immediately destroyed. Lifetime extension allows us to use the temporary object throughout an entire scope:
auto && foo = make_foo();

How can a reference require no storage?

From this question, and consequently, from the Standard (ISO C++-03):
It is unspecified whether or not a reference requires storage (3.7).
In some answers in that thread, it's said that references have, internally, the same structure of a pointer, thus, having the same size of it (32/64 bits).
What I'm struggling to grasp is: how would a reference come not to require storage?
Any sample code exemplifying this would be greatly appreciated.
Edit:
From #JohannesSchaub-litb comment, is there anything like, if I'm not using a const &, or if I'm using a const & with default value, it requires allocation? It seems to me, somehow, that there should be no allocations for references at all -- except, of course, when there are explicit allocations involved, like:
A& new_reference(*(new A())); // Only A() instance would be allocated,
// not the new_reference itself
Is there any case like this?
Take something simple:
int foo() {
int x = 5;
int& r = x;
r = 10;
return x;
}
The implementation may use a pointer to x behind the scenes to implement that reference, but there's no reason it has to. It could just as well translate the code to the equivalent form of:
int foo() {
int x = 10
return x;
}
Then no pointers are needed whatsoever. The compiler can just bake it right into the executable that r is the same as x, without storing and dereferencing a pointer that points at x.
The point is, whether the reference requires any storage is an implementation detail that you shouldn't need to care about.
I believe the key point to understanding is that reference types are not object types.
An object type is a (possibly cv-qualified) type that is not a function type, not a reference type, and not a
void type (§3.9[basic.types]/8)
Objects require storage ("An object is a region of storage." -- §1.8[intro.object]/1)
Moreover, C++ programs operate on objects: "The constructs in a C++ program create, destroy, refer to, access, and manipulate objects." -- same paragraph
So, when the compiler encounters a reference in the program, it is up to the compiler whether it has to synthesize an object (typically of a pointer type), and, therefore, use some storage, or find some other way to implement the desired semantics in terms of object model (which may involve no storage).

Could a smart compiler do all the things std::move does without it being part of the language?

This is a bit theoretical question, but although I have some basic understanding of the std::move Im still not certain if it provides some additional functionality to the language that theoretically couldnt be achieved with supersmart compilers. I know that code like :
{
std::string s1="STL";
std::string s2(std::move(s1));
std::cout << s1 <<std::endl;
}
is a new semantic behavior not just performance sugar. :D But tbh I guess nobody will use var x after doing std::move(x).
Also for movable only data (std::unique_ptr<>, std::thread) couldnt compiler automatically do the move construction and clearing of the old variable if type is declared movable?
Again this would mean that more code would be generated behind programmers back(for example now you can count cpyctor and movector calls, with automagic std::moving you couldnt do that ).
No.
But tbh I guess nobody will use var x after doing std::move(x)
Absolutely not guaranteed. In fact, a decent part of the reason why std::move(x) is not automatically usable by the compiler is because, well, it can't be decided automatically whether or not you intend this. It's explicitly well-defined behaviour.
Also, removing rvalue references would imply that the compiler can automagically write all the move constructors for you. This is definitely not true. D has a similar scheme, but it's a complete failure, because there are numerous useful situations in which the compiler-generated "move constructor" won't work correctly, but you can't change it.
It would also prevent perfect forwarding, which has other uses.
The Committee make many stupid mistakes, but rvalue references is not one of them.
Edit:
Consider something like this:
int main() {
std::unique_ptr<int> x = make_unique<int>();
some_func_that_takes_ownership(x);
int input = 0;
std::cin >> input;
if (input == 0)
some_other_func(x);
}
Owch. Now what? You can't magic the value of "input" to be known at compile-time. This is doubly a problem if the bodies of some_other_func and some_func_that_takes_ownership are unknown. This is Halting Problem- you can't prove that x is or is not used after some_func_that_takes_ownership.
D fails. I promised an example. Basically, in D, "move" is "binary copy and don't destruct the old". Unfortunately, consider a class with, say, a pointer to itself- something you will find in most string classes, most node-based containers, in designs for std::function, boost::variant, and lots of other similar handy value types. The pointer to the internal buffer will be copied but oh noes! points to the old buffer, not the new one. Old buffer is deallocated - GG your program.
It depends on what you mean by "what move does". To satisfy your curiosity, I think what you're looking to be told about the existence of Uniqueness Type Systems and Linear Type Systems.
These are types systems that enforce, at compile-time (in the type system), that a value only be referenced by one location, or that no new references be made. std::unique_ptr is the best approximation C++ can provide, given its rather weak type system.
Let's say we had a new storage-class specifier called uniqueref. This is like const, and specifies that the value has a single unique reference; nobody else has the value. It would enable this:
int main()
{
int* uniqueref x(new int); // only x has this reference
// unique type feature: error, would no longer be unique
auto y = x;
// linear type feature: okay, x not longer usable, z is now the unique owner
auto z = uniquemove(x);
// linear type feature: error: x is no longer usable
*x = 5;
}
(Also interesting to note the immense optimizations that can be taking, knowing a pointer value is really truly only referenced through that pointer. It's a bit like C99's restrict in that aspect.)
In terms of what you're asking, since we can now say that a type is uniquely referenced, we can guarantee that it's safe to move. That said, move operates are ultimately user-defined, and can do all sorts of weird stuff if desired, so implicitly doing this is a bad idea in current C++ anyway.
Everything above is obviously not formally thought-out and specified, but should give you an idea of what such a type system might look like. More generally, you probably want an Effect Type System.
But yes, these ideas do exist and are formally researched. C++ is just too established to add them.
Doing this the way you suggest is a lot more complicated than necessary:
std::string s1="STL";
std::string s2(s1);
std::cout << s1 <<std::endl;
In this case, it is fairly sure that a copy is meant. But if you drop the last line, s1 essentially ends its lifetime after the construction of s2.
In a reference counted implementation, the copy constructor for std::string will only increment the reference counter, while the destructor will decrement and delete if it becomes zero.
So the sequence is
(inlined std::string::string(char const *))
determine string length
allocate memory
copy string
initialize reference counter to 1
initialize pointer in string object
(inlined std::string::string(std::string const &))
increment reference counter
copy pointer to string representation
Now the compiler can flatten that, simply initialize the reference counter to 2 and store the pointer twice. Common Subexpression Elimination then finds out that s1 and s2 keep the same pointer value, and merges them into one.
In short, the only difference in generated code should be that the reference counter is initialized to 2.