RangeV3: why not specialize algorithm for specific containers? - c++

The "old" STL does not allow say std::find to be specialized to use the member functions for std::set or std::unordered_map, etc. It cannot, since the former use iterators, the latter need the containers, and from iterators there are no means to get the containers. There is even a question about that (Specializing STL algorithms so they automatically call efficient container member functions when available).
Since Range V3 finally moves the focus from iterators to ranges, one could expect that finally std::find would be specialized automatically for std::set, which in turn makes std::none_of_equal (if it were adopted) do what you'd like, etc.
However, reading its currently implementation, I see no provision for such a feature. Why?

Related

No find member in std::vector

Is there any particular reason why std::vector does not have a member function find? Instead you have to call std::find (and #include <algorithm>).
The reason why I ask is that I think it would be a good thing to be able to change container class in some piece of implementation without having to change the code wherever the container is accessed. Say I replace an std::vector where the implementation uses std::find with an std::map. Then I also have to replace the call of std::find with a call to the member find, unless I want to keep the linear complexity of std::find. Why not just have a member find for all container classes, which finds an element with whatever algorithm is best suited for each container?
Conceptually, std::find only requires two InputIterators to work and not an std::vector. As such, one implementation works for all containers including STL containers, and standard arrays, and anything that can supply an InputIterator, including for example an istream_iterator() - nice!
So, instead of providing one find() method for every container (and take into account that for some it might not possible, like standard arrays), one single, generic find() function is provided for all. This likely makes your code more resilient to change than adding a find() method for each container since it provides a consistent interface to search in any collection: an input stream from console, network etc., or just a basic array. This is an important aspect of the STL generic design philosophy: you can search for elements in any collection/range defined by two InputIterators.
The downside, as you note, is that in some cases, better performance may be achieved using the container's own method, which can make special assumptions to improve performance (similarly for list::remove, unorderd_map::remove/find() etc.). For this reason, containers can provide (and this is a well known design feature of STL) a method specifically for performance reasons: for example a std::unordered_map does not require one to iterate through the entire map to find an element.
In summary, since the generic std::find works efficiently for a vector, there is no need to provide a member function, since it might enforce even less portable design.
For all things STL related see The C++ Standard Library - A Tutorial and Reference, 2nd Edition
std::find is a common solution. some classes need some kind of specialization of this function so that's why they have them locally (as i remember Meyers said that if a class has its own function-member then you should use it instead of globally defined)

per-container optimizations for functions that operate on iterators

So I know the point of iterators is to abstract the underlying container so you don't have to worry about what it is.
But say you wanted to write an optimized version of merge sort and wanted to do an in place sort if the underlying container was a linked list, since you can run merge sort in-place on a linked list without the need for extra container allocations.
Is there any way to get this information to know whether you're operating on a linked structure and/or access the pointers for Standard Library and other containers?
I'm assuming there is a way since that is what std::sort does? How?
I have also always wished for a way to do this, but the answer is kind-of, but not really. This isn't possible in general, because of the template deduction rules.
Each iterator may be a member of a class, but since multiple classes may have the same iterators, if a function receives these iterators, it is literally and theoretically impossible to deduce which container it came from. That hinders things somewhat.
If you don't need the container type and can work with the iterator type alone, then yes, std::sort could optimize based on the underlying container. But no, std::sort doesn't have a special algorithm for node based containers in any C++ standard library that I know of.
The containers themselves sometimes have specialized versions, see std::list<T>::sort, but the generic std::sort can't make use of that since it works on any arbitrary range, and std::list<T>::sort only works on the entire container.
I really wish they would. Also a specialization of std::lower_bound and similar when called on the tree-based containers would be awesome.
A library implementor can do a certain set of things that are not portable and depend on the details of the implementation. While in the standard the iterator types are referred as nested types in the container, nested types are not deducible, the implementation can decide to implement those as types with namespace scope, which are deducible, and provide a nested typedef. The implementation could then, within the limits of the requirements placed in the standard, optimize the operations.
I don't know of any implementation that implements std::sort as merge sort for lists, but there are other tricks that are used in implementations, for example std::copy when the iterators are into vectors holding POD types is implemented in some libraries by calling std::memmove rather than copying one element at a time.
As I mentioned before, while this can be done by the implementor, writing your own code attempting to do the same would be non-portable and brittle, as it would depend on details of one implementation that might change in the next version of the library or even by changing compiler flags.

Accumulating many vectors into a single container w/o copying

Is it possible in C++11 to accumulate many std::vectors, each returned by given a function (whose API I cannot change), into a std container without copying any element?
std::vector<int> make_vect();
container acc; // what is container?
do {
acc.append(std::move(make_vect())); // how to implement this?
} while(acc.size() < n);
Note 1 that elements must not be copied even if they have no move constructor of move assignment operator, such as int in the example. So you can move a chunk of elements (by copying a pointer), but not individual elements.
Note 2 that container must allow for iteration through all accumulated elements using a single iterator. So std::vector<std::vector<>> or similar is not allowed.
Clearly, it is straightforward to write some container allowing this or to use std::list<std::vector<>> and provide your own iterator, but does the std library provide the desired functionality without such user-written additions?
It seems that the requested functionality is nothing particularly outlandish and I'm surprised how hard (if not impossible) it is even with C++11.
TL;DR I don't think a stock Standard Container can do what you ask. Here's why.
Remember that move semantics for containers are efficient because they are implemented as scope-bound handles to dynamically allocated memory. Moving a container is implemented as copying the handle, while not touching the contained elements.
Your first (explicit) constraint is not to copy any element of any of the containers. That necessitates copying the handles into your putative acc_container. In other words, you want a acc_container<std::vector<T>>. Any of the Standard Containers will allow you to do that efficiently no matter how big individual T elements.
Your second (implicit, inferred from the comments) constraint is that you want to have a uniform interface over all the elements of all individual vectors. In other words, you would like to use this as acc_container<T>. This requires an extra level of indirection in the iterators of your acc_container, where the iterator detects it has reached the end of one of the current vector<T>, and jump to the beginning of the next vector<T>.
Such a container does not exist in the Standard Library.
The easiest work-around is to use a std::vector<std::vector<T>> (to avoid copying T elements), and write your own iterator adaptors (e.g. using boost::indirect_iterator, to provide iteration over T elements).
Unfortunately, even if you provide non-member begin() / end() functions that initialize these indirect iterators from the member .begin()/.end(), range-for will not use ADL to look up these functions because it will prefere the old member functions .begin() / .end(). Furthermore, you will not be able to e.g. insert() a T element directly into your compound container, unless you also provide non-member insert() (and similarly for other functionality).
So if you want a bona fide compound container, with range-for support and a native interface of member functions, you need to write one yourself (with std::vector<std::vector<T> as back-end, and the entire std::vector<T> interface written on top of it). Maybe you can suggest it on the Boost mailinglist as a nice project.
UPDATE: here's a link to an old paper by Matt Austern on Segmented Iterators and Hierarchial Algorithms that shows some performance benefits from this approach. The downside is that you also need to make standard algorithms aware of these iterators.

Specializing STL algorithms so they automatically call efficient container member functions when available

The STL has global algorithms that can operate on arbitrary containers as long as they support the basic requirements of that algorithm. For example, some algorithms may require a container to have random access iterators like a vector rather than a list.
When a container has a faster way of doing something than the generic algorithm would, it provides a member function with the same name to achieve the same goal - like a list providing its own remove_if() since it can remove elements by just doing pointer operations in constant time.
My question is - is it possible/advisable to specialize the generic algorithms so they automatically call the member function version for containers where it's more efficient? E.g. have std::remove_if call list::remove_if internally for lists. Is this already done in the STL?
Not in the case of remove_if, since the semantics are different. std::remove_if doesn't actually erase anything from the container, whereas list::remove_if does, so you definitely don't want the former calling the latter.
Neither you nor the implementation can literally specialize the generic algorithms for containers because the algorithms are function templates that take iterators, and the containers are themselves class templates, whose iterator type depends on the template parameter. So in order to specialize std::remove_if generically for list<T>::iterator you would need a partial specialization of remove_if, and there ain't no such thing as a partial specialization of a function template.
I can't remember whether implementations are allowed to overload algorithms for particular iterator types, but even if not the "official" algorithm can call a function that could be overloaded, or it can use a class that could be partially specialized. Unfortunately none of these techniques help you if you've written your own container, and have spotted a way to make a standard algorithm especially efficient for it.
Suppose for example you have a container with a random-access iterator, but where you have an especially efficient sort technique that works with the standard ordering: a bucket sort, perhaps. Then you might think of putting a free function template <typename T> void sort(MyContainer<T>::iterator first, MyContainer<T>::iterator last) in the same namespace as the class, and allow people to call it with using std::sort; sort(it1, it2); instead std::sort(it1, it2);. The problem is that if they do that in generic code, they risk that someone else writing some other container type will have a function named sort that doesn't even sort the range (the English word "sort" has more than one meaning, after all). So basically you cannot generically sort an iterator range in a way that picks up on efficiencies for user-defined containers.
When the difference in the code depends only on the category of the iterator (for example std::distance which is fast for random access iterators and slow otherwise), this is done using something called "iterator tag dispatch", and that's the most common case where there's a clear efficiency difference between different containers.
If there are any remaining cases that apply to standard containers (discounting ones where the result is different or where the efficiency only requires a particular iterator category), let's have them.
It is not possible - the algorithms work with iterators, and iterators have no knowledge of the container object they refer to. Even if they did, there would be no way to determine at compile time whether a given iterator range refers to the whole of a container, so it could not be done by specialisation alone; there would need to be an extra runtime check.
The only way to do this would be to create your own wrapper templates for each of the algorithms, taking a container rather than a pair of iterators. Then you could specialize your wrapper for each container type that could be optimized. Unfortunately this removes a lot of the flexibility of the standard algorithms, and litters your programs with a bunch of non-standard calls.
What you are looking for isn't specialization, but overloading. You could provide alternative versions of the algorithms (not, legally, in namespace std, though) that take containers as arguments, rather than iterator pairs, and either call the STL pendants calling begin() and end() on the container, or do something else. Such an approach of course does require the code to call your functions instead of the STL functions, though.
This has certainly been done before, so you might actually find a set of headers out there saving you the work to write all this boilerplate code.

Do you have to implement multiple iterators in a STL-like class?

I'm quite familiar with the STL and how to use it. My question is...
If I were to implement my own STL container type, how are the internal iterators defined? STL classes tend to have sequential or random-access iterators, const_ versions of these, and stream iterators.
Are these iterators all fully-defined in every STL class, or is there some sort of base class that you inherit from to gain most of the iterator functionality? Does anyone know a good reference for how to implement a class that supports these different kinds of iterators?
Generally, you only have to implement iterator and const_iterator.
If reverse iterators are desired, they can be obtained using
instantiations of std::reverse_iterator. The stream iterators will use
operator>> and operator<<; typically, they aren't appropriate for a
container (and none of the standard containers provides them).
Yes, you need two different iterators to be fully stdlib compliant.
You can get most typedefs right with inheriting from std::iterator but this wont give you any help with the actual implementation.
Boost.Iterator Facade tries to simplify defining your own iterators and the tutorial is quite helpful.
Should you attempt to do it without helpers you should think about what concept your iterator models and then look at the tables in ยง24 of the C++ standard. They describe all operations you need to support and what the intended semantics are.