Move reference overload for element access in containers? [duplicate] - c++

The standard C++ containers offer only one version of operator[] for containers like vector<T> and deque<T>. It returns a T& (other than for vector<bool>, which I'm going to ignore), which is an lvalue. That means that in code like this,
vector<BigObject> makeVector(); // factory function
auto copyOfObject = makeVector()[0]; // copy BigObject
copyOfObject will be copy constructed. Given that makeVector() returns an rvalue vector, it seems reasonable to expect copyOfObject to be move constructed.
If operator[] for such containers was overloaded for rvalue and lvalue objects, then operator[] for rvalue containers could return an rvalue reference, i.e., an rvalue:
template<typename T>
container {
public:
T& operator[](int index) &; // for lvalue objects
T&& operator[](int index) &&; // for rvalue objects
...
};
In that case, copyOfObject would be move constructed.
Is there a reason this kind of overloading would be a bad idea in general? Is there a reason why it's not done for the standard containers in C++14?

Converting comment into answer:
There's nothing inherently wrong with this approach; class member access follows a similar rule (E1.E2 is an xvalue if E1 is an rvalue and E2 names a non-static data member and is not a reference, see [expr.ref]/4.2), and elements inside a container are logically similar to non-static data members.
A significant problem with doing it for std::vector or other standard containers is that it will likely break some legacy code. Consider:
void foo(int &);
std::vector<int> bar();
foo(bar()[0]);
That last line will stop compiling if operator[] on an rvalue vector returned an xvalue. Alternatively - and arguably worse - if there is a foo(const int &) overload, it will silently start calling that function instead.
Also, returning a bunch of elements in a container and only using one element is already rather inefficient. It's arguable that code that does this probably doesn't care much about speed anyway, and so the small performance improvement is not worth introducing a potentially breaking change.

I think you will leave the container in an invalid state if you move out one of the elements, I would argue the need to allow that state at all. Second, if you ever need that, can't you just call the new object's move constructor like this:
T copyObj = std::move(makeVector()[0]);
Update:
Most important point is, again in my opinion, that containers are containers by their nature, so they should not anyhow modify the elements inside them. They just provide a storage, iteration mechanism, etc.

Related

How to check whether elements of a range should be moved?

There's a similar question: check if elements of a range can be moved?
I don't think the answer in it is a nice solution. Actually, it requires partial specialization for all containers.
I made an attempt, but I'm not sure whether checking operator*() is enough.
// RangeType
using IteratorType = std::iterator_t<RangeType>;
using Type = decltype(*(std::declval<IteratorType>()));
constexpr bool canMove = std::is_rvalue_reference_v<Type>;
Update
The question may could be split into 2 parts:
Could algorithms in STL like std::copy/std::uninitialized_copy actually avoid unnecessary deep copy when receiving elements of r-value?
When receiving a range of r-value, how to check if it's a range adapter like std::ranges::subrange, or a container which holds the ownership of its elements like std::vector?
template <typename InRange, typename OutRange>
void func(InRange&& inRange, OutRange&& outRange) {
using std::begin;
using std::end;
std::copy(begin(inRange), end(inRange), begin(outRange));
// Q1: if `*begin(inRange)` returns a r-value,
// would move-assignment of element be called instead of a deep copy?
}
std::vector<int> vi;
std::list<int> li;
/* ... */
func(std::move(vi), li2);
// Q2: Would elements be shallow copy from vi?
// And if not, how could I implement just limited count of overloads, without overload for every containers?
// (define a concept (C++20) to describe those who take ownership of its elements)
Q1 is not a problem as #Nicol Bolas , #eerorika and #Davis Herring pointed out, and it's not what I puzzled about.
(But I indeed think the API is confusing, std::assign/std::uninitialized_construct may be more ideal names)
#alfC has made a great answer about my question (Q2), and gives a pristine perspective. (move idiom for ranges with ownership of elements)
To sum up, for most of the current containers (especially those from STL), (and also every range adapter...), partial specialization/overload function for all of them is the only solution, e.g.:
template <typename Range>
void func(Range&& range) { /*...*/ }
template <typename T>
void func(std::vector<T>&& movableRange) {
auto movedRange = std::ranges::subrange{
std::make_move_iterator(movableRange.begin()),
std::make_move_iterator(movableRange.end())
};
func(movedRange);
}
// and also for `std::list`, `std::array`, etc...
I understand your point.
I do think that this is a real problem.
My answer is that the community has to agree exactly what it means to move nested objected (such as containers).
In any case this needs the cooperation of the container implementors.
And, in the case of standard containers, good specifications.
I am pessimistic that standard containers can be changed to "generalize" the meaning of "move", but that can't prevent new user defined containers from taking advantage of move-idioms.
The problem is that nobody has studied this in depth as far as I know.
As it is now, std::move seem to imply "shallow" move (one level of moving of the top "value type").
In the sense that you can move the whole thing but not necessarily individual parts.
This, in turn, makes useless to try to "std::move" non-owning ranges or ranges that offer pointer/iterator stability.
Some libraries, e.g. related to std::ranges simply reject r-value of references ranges which I think it is only kicking the can.
Suppose you have a container Bag.
What should std::move(bag)[0] and std::move(bag).begin() return? It is really up to the implementation of the container decide what to return.
It is hard to think of general data structures, bit if the data structure is simple (e.g. dynamic arrays) for consistency with structs (std::move(s).field) std::move(bag)[0] should be the same as std::move(bag[0]) however the standard strongly disagrees with me already here: https://en.cppreference.com/w/cpp/container/vector/operator_at
And it is possible that it is too late to change.
Same goes for std::move(bag).begin() which, using my logic, should return a move_iterator (or something of the like that).
To make things worst, std::array<T, N> works how I would expect (std::move(arr[0]) equivalent to std::move(arr)[0]).
However std::move(arr).begin() is a simple pointer so it looses the "forwarding/move" information! It is a mess.
So, yes, to answer your question, you can check if using Type = decltype(*std::forward<Bag>(bag).begin()); is an r-value but more often than not it will not implemented as r-value.
That is, you have to hope for the best and trust that .begin and * are implemented in a very specific way.
You are in better shape by inspecting (somehow) the category of the range itself.
That is, currently you are left to your own devices: if you know that bag is bound to an r-value and the type is conceptually an "owning" value, you currently have to do the dance of using std::make_move_iterator.
I am currently experimenting a lot with custom containers that I have. https://gitlab.com/correaa/boost-multi
However, by trying to allow for this, I break behavior expected for standard containers regarding move.
Also once you are in the realm of non-owning ranges, you have to make iterators movable by "hand".
I found empirically useful to distinguish top-level move(std::move) and element wise move (e.g. bag.mbegin() or bag.moved().begin()).
Otherwise I find my self overloading std::move which should be last resort if anything at all.
In other words, in
template<class MyRange>
void f(MyRange&& r) {
std::copy(std::forward<MyRange>(r).begin(), ..., ...);
}
the fact that r is bound to an r-value doesn't necessarily mean that the elements can be moved, because MyRange can simply be a non-owning view of a larger container that was "just" generated.
Therefore in general you need an external mechanism to detect if MyRange owns the values or not, and not just detecting the "value category" of *std::forward<MyRange>(r).begin() as you propose.
I guess with ranges one can hope in the future to indicate deep moves with some kind of adaptor-like thing "std::ranges::moved_range" or use the 3-argument std::move.
If the question is whether to use std::move or std::copy (or the ranges:: equivalents), the answer is simple: always use copy. If the range given to you has rvalue elements (i.e., its ranges::range_reference_t is either kind(!) of rvalue), you will move from them anyway (so long as the destination supports move assignment).
move is a convenience for when you own the range and decide to move from its elements.
The answer of the question is: IMPOSSIBLE. At least for the current containers of STL.
Assume if we could add some limitations for Container Requirements?
Add a static constant isContainer, and make a RangeTraits. This may work well, but not an elegant solution I want.
Inspired by #alfC , I'm considering the proper behaviour of a r-value container itself, which may help for making a concept (C++20).
There is an approach to distinguish the difference between a container and range adapter, actually, though it cannot be detected due to the defect in current implementation, but not of the syntax design.
First of all, lifetime of elements cannot exceed its container, and is unrelated with a range adapter.
That means, retrieving an element's address (by iterator or reference) from a r-value container, is a wrong behaviour.
One thing is often neglected in post-11 epoch, ref-qualifier.
Lots of existing member functions, like std::vector::swap, should be marked as l-value qualified:
auto getVec() -> std::vector<int>;
//
std::vector<int> vi1;
//getVec().swap(vi1); // pre-11 grammar, should be deprecated now
vi1 = getVec(); // move-assignment since C++11
For the reasons of compatibility, however, it hasn't been adopted. (It's much more confusing the ref-qualifier hasn't been widely applied to newly-built ones like std::array and std::forward_list..)
e.g., it's easy to implement the subscript operator as we expected:
template <typename T>
class MyArray {
T* _items;
size_t _size;
/* ... */
public:
T& operator [](size_t index) & {
return _items[index];
}
const T& operator [](size_t index) const& {
return _items[index];
}
T operator [](size_t index) && {
// not return by `T&&` !!!
return std::move(_items[index]);
}
// or use `deducing this` since C++23
};
Ok, then std::move(container)[index] would return the same result as std::move(container[index]) (not exactly, may increase an additional move operation overhead), which is convenient when we try to forward a container.
However, how about begin and end?
template <typename T>
class MyArray {
T* _items;
size_t _size;
/* ... */
class iterator;
class const_iterator;
using move_iterator = std::move_iterator<iterator>;
public:
iterator begin() & { /*...*/ }
const_iterator begin() const& { /*...*/ }
// may works well with x-value, but pr-value?
move_iterator begin() && {
return std::make_move_iterator(begin());
}
// or more directly, using ADL
};
So simple, like that?
No! Iterator will be invalidated after destruction of container. So deferencing an iterator from a temporary (pr-value) is undefined behaviour!!
auto getVec() -> std::vector<int>;
///
auto it = getVec().begin(); // Noooo
auto item = *it; // undefined behaviour
Since there's no way (for programmer) to recognize whether an object is pr-value or x-value (both will be duduced into T), retrieving iterator from a r-value container should be forbidden.
If we could regulate behaviours of Container, explicitly delete the function that obtain iterator from a r-value container, then it's possible to detect it out.
A simple demo is here:
https://godbolt.org/z/4zeMG745f
From my perspective, banning such an obviously wrong behaviour may not be so destructive that lead well-implemented old projects failing to compile.
Actually, it just requires some lines of modification for each container, and add proper constraints or overloads for range access utilities like std::begin/std::ranges::begin.

Is std::span a valid iterator::reference type?

I have trouble finding the semantics of the reference type trait of an iterator. Let's say I want to implement a chunk iterator, that, given a position into a range, will give me chunks of that range:
template<class T, int N>
class chunk_iterator {
public:
using reference = std::span<T,N>;
chunk_iterator(T* ptr): ptr(ptr) {}
chunk_iterator operator++() { ptr += N; return *this; }
reference operator*() const { return {ptr,N}; }
private:
T* ptr;
};
The problem that I see here is that std::span is a view-like thing, but it does not behave like a reference (say a std::array<T,N>& in this case). In particular, if I assign to a span, the assignement is shallow, it will not copy the value.
Is std::span a valid iterator::reference type? Are view and reference semantics explained in detail somewhere?
What should I do to solve my problem? Implement a span_ref with proper reference semantics? It it already implemented in some library? Is a non-native reference type even allowed?
(note: solving the problem by storing a std::array<T,N> and returning a std::array<T,N>& in operator* is doable, but ugly, and if N is not known at compile time, storing instead a std::vector<T> with dynamic memory allocation is just plain wrong)
When talking about standard-compliant iterators, it depends on several things.
For conforming Iterators, it almost doesn't matter what the reference type is because the standard does not require any usage semantics for the reference type. But that also means nobody except you knows how to use your iterator.
For conforming Input Iterators, the reference type must meet the semantics specified. Notice that for LegacyInputIterator, the expression *it must be a reference that is usable as a reference with all the normal semantics, otherwise code that uses your iterator will not behave as expected. This means reading from a reference is akin to reading from a built-in reference. In particular, the following should do "normal" things:
auto value = *itr; // this should read a value
In this situation, a view type like span wouldn't work because span is more like a pointer than a reference: in the above snippet value would be a span, not whatever the span refers to.
For conforming Output Iterators, the reference type has no requirements. In fact, standard LegacyOutputIterators like std::back_insert_iterator have void as a reference type.
For conforming Forward Iterators and above, the standard actually requires the reference be a built-in reference. This is to support uses like below:
auto& ref = *itr;
auto ptr = &ref; // this must create a pointer pointing to the original object
auto ref2 = *ptr; // this must create a second, equivalent reference
auto other = std::move( ref ); // this must do a "move", which may be the same as a copy
ref = other; // this must assign "other"'s value back into the referred-to object
If the above didn't work correctly, many of the standard algorithms wouldn't be possible to write generically.
Speaking to span specifically, it acts more like a pointer than a reference logically. It can be re-assigned to point to something else. Taking its address creates a pointer to the span, not a pointer to the container being spanned over. Calling std::move on a span copies the span, and doesn't move the contents of the spanned range. A built-in reference T& will only refer to one thing ever once it's been created.
Creating a non-conforming reference that actually works with standard algorithms would involve a family of types overloading operator*, operator->, and operator&, operator=, and std::move, and modeling pointers, lvalue references, and rvalue references.
The meaning of an iterator's reference type cannot be understood without comprehending its relationship to the iterator's value_type. An iterator is a construct that represents a position within a sequence of value_types. A reference is a mediator within this paradigm; it is a thing that acts like a value_type (const) &. Until you figure out what your value_type is going to be, you can't decide what your reference will need to look like.
What "acts like" means depends on what kind of iterator we're talking about.
For C++11, the InputIterator category requires that reference be a type which is implicitly convertible to a value_type. For the OutputIterator category, reference is required to be a type which is assignable from a value_type.
For all of the more restricted iterator categories (ForwardIterator and above), reference is required to be exactly one of value_type & (if you can write to the sequence) or value_type const & (if you can only read from the sequence).
Iterators where reference is not a value_type (const) & are often called proxy iterators, as the reference type typically acts as a "proxy" for the actual data stored in the sequence (assuming the iterator isn't just inventing values to begin with). Proxy iterators are often used for cases where the iterator doesn't iterate over a range of actual value_types, but simply pretends to. This could be the bitwise iterators of vector<bool> or an iterator that iterates over the sequence of integers on some half-open range [0, N).
But proxy iterator references have to act like language references to one degree or another. InputIterator references have to be implicitly convertible to the value_type. span<T, N> is not implicitly convertible to array<T, N> or any other container type that would be appropriate for a value_type. OutputIterator references have to be assignable from value_type. And while span<T, N> may be assignable from an array<T, N>, the assignment operation doesn't have the same meaning. To assign to an OutputIterator's reference ought to change the values stored within the sequence. And this doesn't.
In any case, you first need to invent a value_type that does what you need it to do. Then you need to build a proper reference type that acts like a reference. Lastly... well, you can't make your iterator a ForwardIterator or higher, because C++11 doesn't support proxy iterators of the most useful iterator categories. C++20's new formulation of iterators allows proxy iterators for anything that isn't a contiguous_iterator.

Why is erasing via a const_iterator allowed in C++11?

As of GCC 4.9.2, it's now possible to compile C++11 code that inserts or erases container elements via a const_iterator.
I can see how it makes sense for insert to accept a const_iterator, but I'm struggling to understand why it makes sense to allow erasure via a const_iterator.
This issue has been discussed previously, but I haven't seen an explanation of the rationale behind the change in behaviour.
The best answer I can think of is that the purpose of the change was to make the behaviour of const_iterator analogous to that of a const T*.
Clearly a major reason for allowing delete with const T* is to enable declarations such as:
const T* x = new T;
....
delete x;
However, it also permits the following less desirable behaviour:
int** x = new int*[2];
x[0] = new int[2];
const int* y = x[0];
delete[] y; // deletes x[0] via a pointer-to-const
and I'm struggling to see why it's a good thing for const_iterator to emulate this behaviour.
erase and insert are non-const member functions of the collection. Non-const member functions are the right way to expose mutating operations.
The constness of the arguments are irrelevant; they aren't being used to modify anything. The collection can be modified because the collection is non-const (held in the hidden this argument).
Compare:
template <typename T>
std::vector<T>::iterator std::vector<T>::erase( std::vector<T>::const_iterator pos );
^^^^^ ok
to a similar overload that is NOT allowed
template <typename T>
std::vector<T>::iterator std::vector<T>::erase( std::vector<T>::iterator pos ) const;
wrong ^^^^^
The first question here is whether constness of an iterator is supposed to imply constnes of the entire container or just constness of the elements being iterated over.
Apparently, it has been decided that the latter is the right approach. Constness of an iterator does not imply constness of the container, it only implies constness of container's elements.
Since the beginning of times, construction of an object and its symmetrical counterpart - destruction of an object - were considered meta-operations with regards to the object itself. The object has always been supposed to behave as mutable with regard to these operations, even if it was declared as const. This is the reason you can legally modify const objects in their constructors and destructors. This is the reason you can delete an object of type T through a const T * pointer.
Given the resolution referred to by 1, it becomes clear that 2 should be extended to iterators as well.

Why isn't operator[] overloaded for lvalues and rvalues?

The standard C++ containers offer only one version of operator[] for containers like vector<T> and deque<T>. It returns a T& (other than for vector<bool>, which I'm going to ignore), which is an lvalue. That means that in code like this,
vector<BigObject> makeVector(); // factory function
auto copyOfObject = makeVector()[0]; // copy BigObject
copyOfObject will be copy constructed. Given that makeVector() returns an rvalue vector, it seems reasonable to expect copyOfObject to be move constructed.
If operator[] for such containers was overloaded for rvalue and lvalue objects, then operator[] for rvalue containers could return an rvalue reference, i.e., an rvalue:
template<typename T>
container {
public:
T& operator[](int index) &; // for lvalue objects
T&& operator[](int index) &&; // for rvalue objects
...
};
In that case, copyOfObject would be move constructed.
Is there a reason this kind of overloading would be a bad idea in general? Is there a reason why it's not done for the standard containers in C++14?
Converting comment into answer:
There's nothing inherently wrong with this approach; class member access follows a similar rule (E1.E2 is an xvalue if E1 is an rvalue and E2 names a non-static data member and is not a reference, see [expr.ref]/4.2), and elements inside a container are logically similar to non-static data members.
A significant problem with doing it for std::vector or other standard containers is that it will likely break some legacy code. Consider:
void foo(int &);
std::vector<int> bar();
foo(bar()[0]);
That last line will stop compiling if operator[] on an rvalue vector returned an xvalue. Alternatively - and arguably worse - if there is a foo(const int &) overload, it will silently start calling that function instead.
Also, returning a bunch of elements in a container and only using one element is already rather inefficient. It's arguable that code that does this probably doesn't care much about speed anyway, and so the small performance improvement is not worth introducing a potentially breaking change.
I think you will leave the container in an invalid state if you move out one of the elements, I would argue the need to allow that state at all. Second, if you ever need that, can't you just call the new object's move constructor like this:
T copyObj = std::move(makeVector()[0]);
Update:
Most important point is, again in my opinion, that containers are containers by their nature, so they should not anyhow modify the elements inside them. They just provide a storage, iteration mechanism, etc.

Passing temporaries as LValues

I'd like to use the following idiom, that I think is non-standard. I have functions which return vectors taking advantage of Return Value Optimization:
vector<T> some_func()
{
...
return vector<T>( /* something */ );
}
Then, I would like to use
vector<T>& some_reference;
std::swap(some_reference, some_func());
but some_func doesn't return a LValue. The above code makes sense, and I found this idiom very useful. However, it is non-standard. VC8 only emits a warning at the highest warning level, but I suspect other compilers may reject it.
My question is: Is there some way to achieve the very same thing I want to do (ie. construct a vector, assign to another, and destroy the old one) which is compliant (and does not use the assignment operator, see below) ?
For classes I write, I usually implement assignment as
class T
{
T(T const&);
void swap(T&);
T& operator=(T x) { this->swap(x); return *this; }
};
which takes advantage of copy elision, and solves my problem. For standard types however, I really would like to use swap since I don't want an useless copy of the temporary.
And since I must use VC8 and produce standard C++, I don't want to hear about C++0x and its rvalue references.
EDIT: Finally, I came up with
typedef <typename T>
void assign(T &x, T y)
{
std::swap(x, y);
}
when I use lvalues, since the compiler is free to optimize the call to the copy constructor if y is temporary, and go with std::swap when I have lvalues. All the classes I use are "required" to implement a non-stupid version of std::swap.
Since std::vector is a class type and member functions can be called on rvalues:
some_func().swap(some_reference);
If you don't want useless copies of temporaries, don't return by value.
Use (shared) pointers, pass function arguments by reference to be filled in, insert iterators, ....
Is there a specific reason why you want to return by value?
The only way I know - within the constraints of the standard - to achieve what you want are to apply the expression templates metaprogramming technique: http://en.wikipedia.org/wiki/Expression_templates Which might or not be easy in your case.