How to overload std::remove for std::list? - c++

I once learnt that the general way to erase elements from a container is via the erase-remove-idiom. But I was surprised to find out that at least g++'s STL implementation does not overload std::remove() for std::list, since in this case a lot of object assignments could be saved by doing the reordering via pointer manipulation.
Is there a reason that the C++ standard does not mandate such an optimisation? But my main question is how I can overload std::remove() (it does not have to be portable beyond g++), so I could provide an implementation that use list::splice()/list::merge() instead. I tried a couple of signatures but get an ambiguity error at best, for example:
template <typename T>
typename std::list<T>::iterator
remove(typename std::list<T>::iterator first,
typename std::list<T>::iterator last, const T &v);
P.S.: I am sorry that I was not clear enough. Please ignore that the functions come from the std namespace and what they do specifically. I just wish to learn more about the template/typetraits/overload rules in C++.

It's not mandated because it's not just an optimization, it has different semantics from what you'd expect for a Sequence container:
std::list<int> l;
l.push_back(1);
l.push_back(2);
std::list<int>::iterator one = l.begin();
std::list<int>::iterator two = l.end(); --two;
if (something) {
l.erase(remove(l.begin(), l.end(), 1), l.end());
// one is still valid and *one == 2, two has been invalidated
} else {
l.remove(1);
// two is still valid and *two == 2, one has been invalidated
}
Regarding the actual question: ISWYM, I'm stuck for the moment how to write a pair of function templates so that one matches arbitrary iterators and the other matches list iterators, without ambiguity.
Do be aware that there isn't actually any guarantee in the standard that list<T>::iterator is a different type from some_other_container<T>::iterator. So although in practice you'd expect each container to have its own iterator, in principle the approach is flawed quite aside from the fact that you suggested putting the overload in std. You can't use iterators alone to make "structural" changes to their corresponding containers.
You can do this without ambiguity:
template <typename Container>
void erase_all(Container &, const typename Container::value_type &);
template <typename T>
void erase_all(std::list<T> &, const T &);

list::remove or list::erase alone will do what you've seen the erase/remove idiom do for vectors.
remove for values or predicates. erase for single iterators or ranges.

The advice you received is good, but not universal. It is good for std::vector, for instance, but completely innecessary for std::list since std::list::erase() and std::list::remove() already do the right thing. They do all the pointer magic you request, something std::vector::erase() cannot do because its internal storage is different. This is the reason why std::remove() is not specialised for std::list: because there is no need to use it in this case.

Related

How to check whether elements of a range should be moved?

There's a similar question: check if elements of a range can be moved?
I don't think the answer in it is a nice solution. Actually, it requires partial specialization for all containers.
I made an attempt, but I'm not sure whether checking operator*() is enough.
// RangeType
using IteratorType = std::iterator_t<RangeType>;
using Type = decltype(*(std::declval<IteratorType>()));
constexpr bool canMove = std::is_rvalue_reference_v<Type>;
Update
The question may could be split into 2 parts:
Could algorithms in STL like std::copy/std::uninitialized_copy actually avoid unnecessary deep copy when receiving elements of r-value?
When receiving a range of r-value, how to check if it's a range adapter like std::ranges::subrange, or a container which holds the ownership of its elements like std::vector?
template <typename InRange, typename OutRange>
void func(InRange&& inRange, OutRange&& outRange) {
using std::begin;
using std::end;
std::copy(begin(inRange), end(inRange), begin(outRange));
// Q1: if `*begin(inRange)` returns a r-value,
// would move-assignment of element be called instead of a deep copy?
}
std::vector<int> vi;
std::list<int> li;
/* ... */
func(std::move(vi), li2);
// Q2: Would elements be shallow copy from vi?
// And if not, how could I implement just limited count of overloads, without overload for every containers?
// (define a concept (C++20) to describe those who take ownership of its elements)
Q1 is not a problem as #Nicol Bolas , #eerorika and #Davis Herring pointed out, and it's not what I puzzled about.
(But I indeed think the API is confusing, std::assign/std::uninitialized_construct may be more ideal names)
#alfC has made a great answer about my question (Q2), and gives a pristine perspective. (move idiom for ranges with ownership of elements)
To sum up, for most of the current containers (especially those from STL), (and also every range adapter...), partial specialization/overload function for all of them is the only solution, e.g.:
template <typename Range>
void func(Range&& range) { /*...*/ }
template <typename T>
void func(std::vector<T>&& movableRange) {
auto movedRange = std::ranges::subrange{
std::make_move_iterator(movableRange.begin()),
std::make_move_iterator(movableRange.end())
};
func(movedRange);
}
// and also for `std::list`, `std::array`, etc...
I understand your point.
I do think that this is a real problem.
My answer is that the community has to agree exactly what it means to move nested objected (such as containers).
In any case this needs the cooperation of the container implementors.
And, in the case of standard containers, good specifications.
I am pessimistic that standard containers can be changed to "generalize" the meaning of "move", but that can't prevent new user defined containers from taking advantage of move-idioms.
The problem is that nobody has studied this in depth as far as I know.
As it is now, std::move seem to imply "shallow" move (one level of moving of the top "value type").
In the sense that you can move the whole thing but not necessarily individual parts.
This, in turn, makes useless to try to "std::move" non-owning ranges or ranges that offer pointer/iterator stability.
Some libraries, e.g. related to std::ranges simply reject r-value of references ranges which I think it is only kicking the can.
Suppose you have a container Bag.
What should std::move(bag)[0] and std::move(bag).begin() return? It is really up to the implementation of the container decide what to return.
It is hard to think of general data structures, bit if the data structure is simple (e.g. dynamic arrays) for consistency with structs (std::move(s).field) std::move(bag)[0] should be the same as std::move(bag[0]) however the standard strongly disagrees with me already here: https://en.cppreference.com/w/cpp/container/vector/operator_at
And it is possible that it is too late to change.
Same goes for std::move(bag).begin() which, using my logic, should return a move_iterator (or something of the like that).
To make things worst, std::array<T, N> works how I would expect (std::move(arr[0]) equivalent to std::move(arr)[0]).
However std::move(arr).begin() is a simple pointer so it looses the "forwarding/move" information! It is a mess.
So, yes, to answer your question, you can check if using Type = decltype(*std::forward<Bag>(bag).begin()); is an r-value but more often than not it will not implemented as r-value.
That is, you have to hope for the best and trust that .begin and * are implemented in a very specific way.
You are in better shape by inspecting (somehow) the category of the range itself.
That is, currently you are left to your own devices: if you know that bag is bound to an r-value and the type is conceptually an "owning" value, you currently have to do the dance of using std::make_move_iterator.
I am currently experimenting a lot with custom containers that I have. https://gitlab.com/correaa/boost-multi
However, by trying to allow for this, I break behavior expected for standard containers regarding move.
Also once you are in the realm of non-owning ranges, you have to make iterators movable by "hand".
I found empirically useful to distinguish top-level move(std::move) and element wise move (e.g. bag.mbegin() or bag.moved().begin()).
Otherwise I find my self overloading std::move which should be last resort if anything at all.
In other words, in
template<class MyRange>
void f(MyRange&& r) {
std::copy(std::forward<MyRange>(r).begin(), ..., ...);
}
the fact that r is bound to an r-value doesn't necessarily mean that the elements can be moved, because MyRange can simply be a non-owning view of a larger container that was "just" generated.
Therefore in general you need an external mechanism to detect if MyRange owns the values or not, and not just detecting the "value category" of *std::forward<MyRange>(r).begin() as you propose.
I guess with ranges one can hope in the future to indicate deep moves with some kind of adaptor-like thing "std::ranges::moved_range" or use the 3-argument std::move.
If the question is whether to use std::move or std::copy (or the ranges:: equivalents), the answer is simple: always use copy. If the range given to you has rvalue elements (i.e., its ranges::range_reference_t is either kind(!) of rvalue), you will move from them anyway (so long as the destination supports move assignment).
move is a convenience for when you own the range and decide to move from its elements.
The answer of the question is: IMPOSSIBLE. At least for the current containers of STL.
Assume if we could add some limitations for Container Requirements?
Add a static constant isContainer, and make a RangeTraits. This may work well, but not an elegant solution I want.
Inspired by #alfC , I'm considering the proper behaviour of a r-value container itself, which may help for making a concept (C++20).
There is an approach to distinguish the difference between a container and range adapter, actually, though it cannot be detected due to the defect in current implementation, but not of the syntax design.
First of all, lifetime of elements cannot exceed its container, and is unrelated with a range adapter.
That means, retrieving an element's address (by iterator or reference) from a r-value container, is a wrong behaviour.
One thing is often neglected in post-11 epoch, ref-qualifier.
Lots of existing member functions, like std::vector::swap, should be marked as l-value qualified:
auto getVec() -> std::vector<int>;
//
std::vector<int> vi1;
//getVec().swap(vi1); // pre-11 grammar, should be deprecated now
vi1 = getVec(); // move-assignment since C++11
For the reasons of compatibility, however, it hasn't been adopted. (It's much more confusing the ref-qualifier hasn't been widely applied to newly-built ones like std::array and std::forward_list..)
e.g., it's easy to implement the subscript operator as we expected:
template <typename T>
class MyArray {
T* _items;
size_t _size;
/* ... */
public:
T& operator [](size_t index) & {
return _items[index];
}
const T& operator [](size_t index) const& {
return _items[index];
}
T operator [](size_t index) && {
// not return by `T&&` !!!
return std::move(_items[index]);
}
// or use `deducing this` since C++23
};
Ok, then std::move(container)[index] would return the same result as std::move(container[index]) (not exactly, may increase an additional move operation overhead), which is convenient when we try to forward a container.
However, how about begin and end?
template <typename T>
class MyArray {
T* _items;
size_t _size;
/* ... */
class iterator;
class const_iterator;
using move_iterator = std::move_iterator<iterator>;
public:
iterator begin() & { /*...*/ }
const_iterator begin() const& { /*...*/ }
// may works well with x-value, but pr-value?
move_iterator begin() && {
return std::make_move_iterator(begin());
}
// or more directly, using ADL
};
So simple, like that?
No! Iterator will be invalidated after destruction of container. So deferencing an iterator from a temporary (pr-value) is undefined behaviour!!
auto getVec() -> std::vector<int>;
///
auto it = getVec().begin(); // Noooo
auto item = *it; // undefined behaviour
Since there's no way (for programmer) to recognize whether an object is pr-value or x-value (both will be duduced into T), retrieving iterator from a r-value container should be forbidden.
If we could regulate behaviours of Container, explicitly delete the function that obtain iterator from a r-value container, then it's possible to detect it out.
A simple demo is here:
https://godbolt.org/z/4zeMG745f
From my perspective, banning such an obviously wrong behaviour may not be so destructive that lead well-implemented old projects failing to compile.
Actually, it just requires some lines of modification for each container, and add proper constraints or overloads for range access utilities like std::begin/std::ranges::begin.

How to use `boost::range` iterators with standard iterators

I have functions that take in std::vector iterators, as in
typedef std::vector<Point> Points;
Points ConvexHull(Points::const_iterator first, Points::const_iterator last);
I usually pass the std iterators to them, but occasionally I need to work with boost iterators, such as boost::join's range iterator. How should I change the parametrizations of my functions, ideally without templates, so that they accept both iterators? Moreover, how do I indicate in each type which iterator concepts I need?
I tried looking at the boost::range documentation but it's overwhelmingly confusing for me and I don't know where to start.
For example, I couldn't find the difference between boost::range_details::any_forward_iterator_interface and boost::range_details::any_forward_iterator_wrapper, and whether I should use either of those to specify that I need a forward iterator.
Edit:
If I use boost::any_range, how can I pass non-const lvalue references?
For example:
template<typename T>
using Range = boost::any_range<T, boost::random_access_traversal_tag,
T, std::ptrdiff_t>;
f(Range<Point> &points); // defined elsewhere
// -------------
vector<Point> vec;
f(vec); // error; cannot bind non-const lvalue reference to unrelated type
boost-range has the any_range for this purpose and it suits both purposes for your case.
https://www.boost.org/doc/libs/1_60_0/libs/range/doc/html/range/reference/ranges/any_range.html
From your example it would look like this:
#include <boost/range/any_range.hpp>
typedef boost::any_range<Point,
boost::bidirectional_traversal_tag,
Point,
std::ptrdiff_t
> PointRange;
You should strongly consider using a template. Doing so let's the compiler keep useful information about what operations are actually occurring, which greatly helps it generate optimised output. The std:: convention is to name the type parameter for the concept required. E.g.
template< class BidirIt, class UnaryPredicate > // anything bidirectional (which includes random access)
BidirIt std::partition( BidirIt first, BidirIt last, UnaryPredicate p );
If you really don't want a template, you still shouldn't name anything in a detail namespace. Something like
#include <boost/range/any_range.hpp>
using PointRange = boost::any_range<Point, boost::random_access_traversal_tag>; // or another traversal tag.
using PointIterator = PointRange::iterator;
You will likely need to pass PointRange & less frequently than, say, int *&. Almost always passing by value is the correct behaviour. It is cheap to copy, as it holds a begin and end iterator from the Range that it was constructed from, nothing more.

Is it standard C++ to call move() with an output iterator that has been moved previously?

While brushing up on algorithm design and learning C++11 at the same time, I came up with the following implementation for heap sort:
template <typename It, typename Comp>
void heapSort(It begin, It end, Comp compFunc, std::random_access_iterator_tag)
{
std::make_heap(begin, end, compFunc);
std::sort_heap(begin, end, compFunc);
}
template <typename It, typename Comp, typename IterCat>
void heapSort(It begin, It end, Comp compFunc, IterCat)
{
typedef typename It::value_type value_type;
std::vector<value_type> randomAccessContainer;
randomAccessContainer.reserve(std::distance(begin, end));
std::move(begin, end, std::back_inserter(randomAccessContainer));
heapSort(std::begin(randomAccessContainer), std::end(randomAccessContainer), compFunc, std::random_access_iterator_tag());
std::move(std::begin(randomAccessContainer), std::end(randomAccessContainer), begin);
}
Is it standard C++ to first move from [begin, end) into a new container, and then move from that container back into [begin, end)?
Is it standard C++ to first move from [begin, end) into a new container, and then move from that container back into [begin, end)?
I was initially confused by your use of the word "standard", and edited the question so that it was asking whether this was "legal". The answer to that question would be: "Yes, it is perfectly legal". After the elements in the original range are moved from, they are still in a valid (even though unspecified) state.
Hence, the second call to std::move() will just move-assign elements, and the type of those elements shall have a move-assignment operator without pre-conditions. As long as this is the case, I see no problem with that.
After editing your question, though, I started to wonder whether you actually wanted to ask if this is "standard", meaning "common practice", which is why I restored the original wording.
The answer to this question is "Partly". You would normally initialize your temporary vector by using a couple of move iterators rather than invoking std::move:
std::vector<value_type> randomAccessContainer(
std::make_move_iterator(begin),
std::make_move_iterator(end)
);
Apart from this, your implementation seems correct to me.
No it isn't.
I can see how you'd need it to be a random access container if it wasn't already. In that case prefer std::make_move_iterator:
std::vector<value_type> randomAccessContainer(
std::make_move_iterator(begin),
std::make_move_iterator(end));
In all other cases, you'd want to be sorting in-place. (Unless you need "no effects" on exception, maybe)

Is it possible to test whether two iterators point to the same object?

Say I'm making a function to copy a value:
template<class ItInput, class ItOutput>
void copy(ItInput i, ItOutput o) { *o = *i; }
and I would like to avoid the assignment if i and o point to the same object, since then the assignment is pointless.
Obviously, I can't say if (i != o) { ... }, both because i and o might be of different types and because they might point into different containers (and would thus be incomparable). Less obviously, I can't use overloaded function templates either, because the iterators might belong to different containers even though they have the same type.
My initial solution to this was:
template<class ItInput, class ItOutput>
void copy(ItInput i, ItOutput o)
{
if (&*o != static_cast<void const *>(&*i))
*o = *i;
}
but I'm not sure if this works. What if *o or *i actually returns an object instead of a reference?
Is there a way to do this generally?
I don't think that this is really necessary: if assignment is expensive, the type should define an assignment operator that performs the (relatively cheap) self assignment check to prevent doing unnecessary work. But, it's an interesting question, with many pitfalls, so I'll take a stab at answering it.
If we are to assemble a general solution that works for input and output iterators, there are several pitfalls that we must watch out for:
An input iterator is a single-pass iterator: you can only perform indirection via the iterator once per element, so, we can't perform indirection via the iterator once to get the address of the pointed-to value and a second time to perform the copy.
An input iterator may be a proxy iterator. A proxy iterator is an iterator whose operator* returns an object, not a reference. With a proxy iterator, the expression &*it is ill-formed, because *it is an rvalue (it's possible to overload the unary-&, but doing so is usually considered evil and horrible, and most types do not do this).
An output iterator can only be used for output; you cannot perform indirection via it and use the result as an rvalue. You can write to the "pointed to element" but you can't read from it.
So, if we're going to make your "optimization," we'll need to make it only for the case where both iterators are forward iterators (this includes bidirectional iterators and random access iterators: they're forward iterators too).
Because we're nice, we also need to be mindful of the fact that, despite the fact that it violates the concept requirements, many proxy iterators misrepresent their category because it is very useful to have a proxy iterator that supports random access over a sequence of proxied objects. (I'm not even sure how one could implement an efficient iterator for std::vector<bool> without doing this.)
We'll use the following Standard Library headers:
#include <iterator>
#include <type_traits>
#include <utility>
We define a metafunction, is_forward_iterator, that tests whether a type is a "real" forward iterator (i.e., is not a proxy iterator):
template <typename T>
struct is_forward_iterator :
std::integral_constant<
bool,
std::is_base_of<
std::forward_iterator_tag,
typename std::iterator_traits<T>::iterator_category
>::value &&
std::is_lvalue_reference<
decltype(*std::declval<T>())
>::value>
{ };
For brevity, we also define a metafunction, can_compare, that tests whether two types are both forward iterators:
template <typename T, typename U>
struct can_compare :
std::integral_constant<
bool,
is_forward_iterator<T>::value &&
is_forward_iterator<U>::value
>
{ };
Then, we'll write two overloads of the copy function and use SFINAE to select the right overload based on the iterator types: if both iterators are forward iterators, we'll include the check, otherwise we'll exclude the check and always perform the assignment:
template <typename InputIt, typename OutputIt>
auto copy(InputIt const in, OutputIt const out)
-> typename std::enable_if<can_compare<InputIt, OutputIt>::value>::type
{
if (static_cast<void const volatile*>(std::addressof(*in)) !=
static_cast<void const volatile*>(std::addressof(*out)))
*out = *in;
}
template <typename InputIt, typename OutputIt>
auto copy(InputIt const in, OutputIt const out)
-> typename std::enable_if<!can_compare<InputIt, OutputIt>::value>::type
{
*out = *in;
}
As easy as pie!
I think this may be a case where you may have to document some assumptions about the types you expect in the function and be content with not being completely generic.
Like operator*, operator& could be overloaded to do all sorts of things. If you're guarding against operator*, then you should consider operator& and operator!=, etc.
I would say that a good prerequisite to enforce (either through comments in the code or a concept/static_assert) is that operator* returns a reference to the object pointed to by the iterator and that it doesn't (or shouldn't) perform a copy. In that case, your code as it stands seems fine.
Your code, as is, is definitly not okay, or atleast not okay for all iterator categories.
Input iterators and output iterators are not required to be dereferenceable after the first time (they're expected to be single-pass) and input iterators are allowed to dereference to anything "convertible to T" (ยง24.2.3/2).
So, if you want to handle all kinds of iterators, I don't think you can enforce this "optimization", i.e. you can't generically check if two iterators point to the same object. If you're willing to forego input and output iterators, what you have should be fine. Otherwise, I'd stick with doing the copy in any case (I really don't think you have another option on this).
Write a helper template function equals that automatically returns false if the iterators are different types. Either that or do a specialization or overload of your copy function itself.
If they're the same type then you can use your trick of comparing the pointers of the objects they resolve to, no casting required:
if (&*i != &*o)
*o = *i;
If *i or *o doesn't return a reference, no problem - the copy will occur even if it didn't have to, but no harm will be done.

Get a pointer to STL container an iterator is referencing?

For example, the following is possible:
std::set<int> s;
std::set<int>::iterator it = s.begin();
I wonder if the opposite is possible, say,
std::set<int>* pSet = it->**getContainer**(); // something like this...
No, there is no portable way to do this.
An iterator may not even have a reference to the container. For example, an implementation could use T* as the iterator type for both std::array<T, N> and std::vector<T>, since both store their elements as arrays.
In addition, iterators are far more general than containers, and not all iterators point into containers (for example, there are input and output iterators that read to and write from streams).
No. You must remember the container that an iterator came from, at the time that you find the iterator.
A possible reason for this restriction is that pointers were meant to be valid iterators and there's no way to ask a pointer to figure out where it came from (e.g. if you point 4 elements into an array, how from that pointer alone can you tell where the beginning of the array is?).
It is possible with at least one of the std iterators and some trickery.
The std::back_insert_iterator needs a pointer to the container to call its push_back method. Moreover this pointer is protected only.
#include <iterator>
template <typename Container>
struct get_a_pointer_iterator : std::back_insert_iterator<Container> {
typedef std::back_insert_iterator<Container> base;
get_a_pointer_iterator(Container& c) : base(c) {}
Container* getPointer(){ return base::container;}
};
#include <iostream>
int main() {
std::vector<int> x{1};
auto p = get_a_pointer_iterator<std::vector<int>>(x);
std::cout << (*p.getPointer()).at(0);
}
This is of course of no pratical use, but merely an example of an std iterator that indeed carries a pointer to its container, though a quite special one (eg. incrementing a std::back_insert_iterator is a noop). The whole point of using iterators is not to know where the elements are coming from. On the other hand, if you ever wanted an iterator that lets you get a pointer to the container, you could write one.