Abstracting containers in C++ - c++

I would like to have a virtual base class (interface) that contains a method that returns an iterable object (a container?). The implementing classes would each handle their containers themselves, for example, class A would implement the container as a vector, class B as a list, etc.
Essentially, I would like to do something like this:
class Base{
virtual Container getContainer() = 0;
}
class A:Base{
Vector v;
Container getContainer() {return v;}
}
A a;
Iterator begin = a.getContainer().begin;
Iterator end = a.getContainer().end;
As in, the caller will be responsible for handling the iterator (calling the begin and end functions for iteration for example)
I assume something like this is possible, but I can't figure out how to do it. Specifically, I assume classes like vector and list inherit from a common interface that defines methods begin() and end(), but I can't figure out what that interface is, and how to handle it.

Check out cppreference.com. This common interface is only a convention and some rules, not actual C++ types. So no, you can't use containers polymorphically. The simple reason is efficiency, runtime polymorphism costs performance. For a more detailed insight, there are many documents concerning the design ideas behind the STL, which is reflected there.
If you really want to implement this, not only would you need wrapper classes around the STL-style sequence containers (deque, vector, list, array), but also around the iterators. Iterators are usually hard-tied to the containers, hence this connection. The wrapper classes will also have to provide the various methods you'd find in the underlying containers, like e.g. begin(), end(), size() etc.
Note that not every container supports all methods. For example, some don't support push_front() but only push_back(). All of them have begin() and end(), but a singly-linked list (not currently part of C++) can't have rbegin() and rend(). You could sort those categories into separate pure virtual baseclasses.
Overall, I'd doubt the usefulness of this. If you want to swap implementations, design your code so that the container becomes a template parameter to it. All calls are then resolved at compile time, leading to less code, less memory requirements and more performance.

Containers are not polymorphic. There is no common base class. The same is true of iterators.
This is intentional, because iterating element by element turns out to be highly inefficient when done through a virtual method based interface. It can be done, but it is slow.
Boost has any ranges and any iterators, or you can roll your own vis type erasure techniques. I advise against it.
The simplest and cheapest way to get iteration polymorphic is adding a foreach_element(std::function<void(Element const&)>)const method. You can batch up the iteration to reduce overhead with foreach_element(std::function<void(std::span<Element const>)>)const, allowing elements to be clumped by the container. That would be easy to write and faster than fully polymorphic iterators and containers.

Related

Why there are no overloads of find/lower_bound for std::map and no overload of sort for std::list?

I know that you should never use std::find(some_map.begin(), some_map.end()) or std::lower_bound, because it will take linear time instead of logarithmic provided by some_map.lower_bound. Similar thing happens with std::list: there is std::list::sort function for sorting, but you're unable to call std::sort(some_list.begin(), some_list.end()), because iterators are not random-access.
However, std::swap, for instance, has overloads for standard containers, so that call of swap(some_map, other_map) takes O(1), not O(n). Why doesn't C++ standard give us specialized versions of lower_bound and find for maps and sets? Is there are deep reason?
I don't think there's any deep reason, but it's more philosophical than anything. The free function forms of standard algorithms, including the ones you mention, take iterator pairs indicating the range over which they'll traverse. There's no way for the algorithm to determine the type of the underlying container from these iterators.
Providing specializations, or overloads, would be a departure from this model since you'd have to pass the container itself to the algorithm.
swap is different because it takes instances of the types involved as arguments, and not just iterators.
The rule followed by the STL is that the free-function algorithms operate on iterators and are generic, not caring about the type of range the iterators belong to (other than the iterator category). If a particular container type can implement an operation more efficiently than using std::algo(cont.begin(), cont.end()) then that operation is provided as a member function of the container and called as cont.algo().
So you have std::map<>::lower_bound() and std::map<>::find() and std::list<>::sort().
The simple fact that such functions should work in terms of iterator pairs would produce a quite significant overhead, even if specialized for map/set iterators.
Keep in mind that:
Such functions can be called with any pair of iterators, not just a begin/end.
Iterators of map/sets are usually implemented as a pointer to a leaf node, while the starting node of the member find/lower_bound is the root node of the tree.
Having a member find and lower_bound is better, because a pointer to the root node is directly stored as a member of the map/set object.
An hypothetical non-member find would have to traverse the tree to find the lowest common ancestor between the two input nodes, and then do a dfs - while still being careful to search only in the [first,last) range - which is significantly more expensive.
Yes, you could keep track of the root node inside the iterator, then optimize if the function is called with a begin/end pair... just to avoid a member function?
There are good "design philosophy" reasons for not providing those overloads. In particular, the whole STL containers / algorithms interaction is designed to go through the different iterator concepts such that those two parts remain independent, so that you can define new container types without having to overload for all the algorithms, and so that you can define new algorithms without having to overload for all the containers.
Beyond those design choices, there is a very good technical reason why these algorithms (sort, lower-bound, etc..) cannot be overloaded for the special containers like list or map / set. The reason is that there is generally no reason for iterators to be explicitly connected to their parent container. In other words, you can get list iterators (begin / end) from the list container, but you cannot obtain (a reference to) the list container from a list iterator. The same goes for map / set iterators. Iterators can be implemented to be "blind" to their parent container. For example, a container like list would normally contain, as data members, pointers to the head and tail nodes, and the allocator object, but the iterators themselves (usually just pointers to a node) don't need to know about the head / tail or allocator, they just have to walk the previous / next link pointers to move back / forth in the list. So, if you were to implement an overload of std::sort for the list container, there would be no way to retrieve the list container from the iterators that are passed to the sort function. And, in all the cases you mentioned, the versions of the algorithms that are appropriate for those special containers are "centralized" in the sense that they need to be run at the container level, not at the "blind" iterator level. That's why they make sense as member functions, and that's why you cannot get to them from the iterators.
However, if you need to have a generic way to call these algorithms regardless of the container that you have, then you can provide overloaded function templates at the container level if you want to. It's as easy as this:
template <typename Container>
void sort_container(Container& c) {
std::sort(begin(c), end(c));
};
template <typename T, typename Alloc>
void sort_container(std::list<T, Alloc>& c) {
c.sort();
};
template <typename Key, typename T, typename Compare, typename Alloc>
void sort_container(std::map<Key, T, Compare, Alloc>&) { /* nothing */ };
//... so on..
The point here is that if you are already tied in with STL containers, then you can tie in with STL algorithms in this way if you want to... but don't expect the C++ standard library to force you to have this kind of inter-dependence between containers and algorithms, because not everyone wants them (e.g., a lot of people really like STL algorithms, but have no love for the STL containers (which have numerous problems)).

Accumulating many vectors into a single container w/o copying

Is it possible in C++11 to accumulate many std::vectors, each returned by given a function (whose API I cannot change), into a std container without copying any element?
std::vector<int> make_vect();
container acc; // what is container?
do {
acc.append(std::move(make_vect())); // how to implement this?
} while(acc.size() < n);
Note 1 that elements must not be copied even if they have no move constructor of move assignment operator, such as int in the example. So you can move a chunk of elements (by copying a pointer), but not individual elements.
Note 2 that container must allow for iteration through all accumulated elements using a single iterator. So std::vector<std::vector<>> or similar is not allowed.
Clearly, it is straightforward to write some container allowing this or to use std::list<std::vector<>> and provide your own iterator, but does the std library provide the desired functionality without such user-written additions?
It seems that the requested functionality is nothing particularly outlandish and I'm surprised how hard (if not impossible) it is even with C++11.
TL;DR I don't think a stock Standard Container can do what you ask. Here's why.
Remember that move semantics for containers are efficient because they are implemented as scope-bound handles to dynamically allocated memory. Moving a container is implemented as copying the handle, while not touching the contained elements.
Your first (explicit) constraint is not to copy any element of any of the containers. That necessitates copying the handles into your putative acc_container. In other words, you want a acc_container<std::vector<T>>. Any of the Standard Containers will allow you to do that efficiently no matter how big individual T elements.
Your second (implicit, inferred from the comments) constraint is that you want to have a uniform interface over all the elements of all individual vectors. In other words, you would like to use this as acc_container<T>. This requires an extra level of indirection in the iterators of your acc_container, where the iterator detects it has reached the end of one of the current vector<T>, and jump to the beginning of the next vector<T>.
Such a container does not exist in the Standard Library.
The easiest work-around is to use a std::vector<std::vector<T>> (to avoid copying T elements), and write your own iterator adaptors (e.g. using boost::indirect_iterator, to provide iteration over T elements).
Unfortunately, even if you provide non-member begin() / end() functions that initialize these indirect iterators from the member .begin()/.end(), range-for will not use ADL to look up these functions because it will prefere the old member functions .begin() / .end(). Furthermore, you will not be able to e.g. insert() a T element directly into your compound container, unless you also provide non-member insert() (and similarly for other functionality).
So if you want a bona fide compound container, with range-for support and a native interface of member functions, you need to write one yourself (with std::vector<std::vector<T> as back-end, and the entire std::vector<T> interface written on top of it). Maybe you can suggest it on the Boost mailinglist as a nice project.
UPDATE: here's a link to an old paper by Matt Austern on Segmented Iterators and Hierarchial Algorithms that shows some performance benefits from this approach. The downside is that you also need to make standard algorithms aware of these iterators.

Specializing STL algorithms so they automatically call efficient container member functions when available

The STL has global algorithms that can operate on arbitrary containers as long as they support the basic requirements of that algorithm. For example, some algorithms may require a container to have random access iterators like a vector rather than a list.
When a container has a faster way of doing something than the generic algorithm would, it provides a member function with the same name to achieve the same goal - like a list providing its own remove_if() since it can remove elements by just doing pointer operations in constant time.
My question is - is it possible/advisable to specialize the generic algorithms so they automatically call the member function version for containers where it's more efficient? E.g. have std::remove_if call list::remove_if internally for lists. Is this already done in the STL?
Not in the case of remove_if, since the semantics are different. std::remove_if doesn't actually erase anything from the container, whereas list::remove_if does, so you definitely don't want the former calling the latter.
Neither you nor the implementation can literally specialize the generic algorithms for containers because the algorithms are function templates that take iterators, and the containers are themselves class templates, whose iterator type depends on the template parameter. So in order to specialize std::remove_if generically for list<T>::iterator you would need a partial specialization of remove_if, and there ain't no such thing as a partial specialization of a function template.
I can't remember whether implementations are allowed to overload algorithms for particular iterator types, but even if not the "official" algorithm can call a function that could be overloaded, or it can use a class that could be partially specialized. Unfortunately none of these techniques help you if you've written your own container, and have spotted a way to make a standard algorithm especially efficient for it.
Suppose for example you have a container with a random-access iterator, but where you have an especially efficient sort technique that works with the standard ordering: a bucket sort, perhaps. Then you might think of putting a free function template <typename T> void sort(MyContainer<T>::iterator first, MyContainer<T>::iterator last) in the same namespace as the class, and allow people to call it with using std::sort; sort(it1, it2); instead std::sort(it1, it2);. The problem is that if they do that in generic code, they risk that someone else writing some other container type will have a function named sort that doesn't even sort the range (the English word "sort" has more than one meaning, after all). So basically you cannot generically sort an iterator range in a way that picks up on efficiencies for user-defined containers.
When the difference in the code depends only on the category of the iterator (for example std::distance which is fast for random access iterators and slow otherwise), this is done using something called "iterator tag dispatch", and that's the most common case where there's a clear efficiency difference between different containers.
If there are any remaining cases that apply to standard containers (discounting ones where the result is different or where the efficiency only requires a particular iterator category), let's have them.
It is not possible - the algorithms work with iterators, and iterators have no knowledge of the container object they refer to. Even if they did, there would be no way to determine at compile time whether a given iterator range refers to the whole of a container, so it could not be done by specialisation alone; there would need to be an extra runtime check.
The only way to do this would be to create your own wrapper templates for each of the algorithms, taking a container rather than a pair of iterators. Then you could specialize your wrapper for each container type that could be optimized. Unfortunately this removes a lot of the flexibility of the standard algorithms, and litters your programs with a bunch of non-standard calls.
What you are looking for isn't specialization, but overloading. You could provide alternative versions of the algorithms (not, legally, in namespace std, though) that take containers as arguments, rather than iterator pairs, and either call the STL pendants calling begin() and end() on the container, or do something else. Such an approach of course does require the code to call your functions instead of the STL functions, though.
This has certainly been done before, so you might actually find a set of headers out there saving you the work to write all this boilerplate code.

Do you have to implement multiple iterators in a STL-like class?

I'm quite familiar with the STL and how to use it. My question is...
If I were to implement my own STL container type, how are the internal iterators defined? STL classes tend to have sequential or random-access iterators, const_ versions of these, and stream iterators.
Are these iterators all fully-defined in every STL class, or is there some sort of base class that you inherit from to gain most of the iterator functionality? Does anyone know a good reference for how to implement a class that supports these different kinds of iterators?
Generally, you only have to implement iterator and const_iterator.
If reverse iterators are desired, they can be obtained using
instantiations of std::reverse_iterator. The stream iterators will use
operator>> and operator<<; typically, they aren't appropriate for a
container (and none of the standard containers provides them).
Yes, you need two different iterators to be fully stdlib compliant.
You can get most typedefs right with inheriting from std::iterator but this wont give you any help with the actual implementation.
Boost.Iterator Facade tries to simplify defining your own iterators and the tutorial is quite helpful.
Should you attempt to do it without helpers you should think about what concept your iterator models and then look at the tables in ยง24 of the C++ standard. They describe all operations you need to support and what the intended semantics are.

Immutable C++ container class

Say that I have a C++ class, Container, that contains some elements of type Element. For various reasons, it is inefficient, undesirable, unnecessary, impractical, and/or impossible (1) to modify or replace the contents after construction. Something along the lines of const std::list<const Element> (2).
Container can meet many requirements of the STL's "container" and "sequence" concepts. It can provide the various types like value_type, reference, etc. It can provide a default constructor, a copy constructor, a const_iterator type, begin() const, end() const, size, empty, all the comparison operators, and maybe some of rbegin() const, rend() const, front(), back(), operator[](), and at().
However, Container can't provide insert, erase, clear, push_front, push_back, non-const front, non-const back, non-const operator[], or non-const at with the expected semantics. So it appears that Container can't qualify as a "sequence". Further, Container can't provide operator=, and swap, and it can't provide an iterator type that points to a non-const element. So, it can't even qualify as a "container".
Is there some less-capable STL concept that Container meets? Is there a "read-only container" or an "immutable container"?
If Container doesn't meet any defined level of conformance, is there value in partial conformance? Is is misleading to make it look like a "container", when it doesn't qualify? Is there a concise, unambiguous way that I can document the conformance so that I don't have to explicitly document the conforming semantics? And similarly, a way to document it so that future users know they can take advantage of read-only generic code, but don't expect mutating algorithms to work?
What do I get if I relax the problem so Container is Assignable (but its elements are not)? At that point, operator= and swap are possible, but dereferencing iterator still returns a const Element. Does Container now qualify as a "container"?
const std::list<T> has approximately the same interface as Container. Does that mean it is neither a "container" nor a "sequence"?
Footnote (1) I have use cases that cover this whole spectrum. I have a would-be-container class that adapts some read-only data, so it has to be immutable. I have a would-be-container that generates its own contents as needed, so it's mutable but you can't replace elements the way the STL requires. I yet have another would-be-container that stores its elements in a way that would make insert() so slow that it would never be useful. And finally, I have a string that stores text in UTF-8 while exposing a code-point oriented interface; a mutable implementation is possible but completely unnecessary.
Footnote (2) This is just for illustration. I'm pretty sure std::list requires an assignable element type.
The STL doesn't define any lesser concepts; mostly because the idea of const is usually expressed on a per-iterator or per-reference level, not on a per-class level.
You shouldn't provide iterator with unexpected semantics, only provide const_iterator. This allows client code to fail in the most logical place (with the most readable error message) if they make a mistake.
Possibly the easiest way to do it would be to encapsulate it and prevent all non-const aliases.
class example {
std::list<sometype> stuff;
public:
void Process(...) { ... }
const std::list<sometype>& Results() { return stuff; }
};
Now any client code knows exactly what they can do with the return value of Results- nada that requires mutation.
As long as your object can provider a conforming const_iterator it doesn't have to have anything else. It should be pretty easy to implement this on your container class.
(If applicable, look at the Boost.Iterators library; it has iterator_facade and iterator_adaptor classes to help you with the nitty-gritty details)