Custom iterator for multiple containers in C++ - c++

I have a pure abstract class and two derived classes that I use to store the same kind of data, let's say int, but in different data structures, let's say a map and a vector.
class AbstractContainer {
public:
virtual MyIterator firstValue() = 0;
}
class ContainerMap : public AbstractContainer {
private:
map<K, int>;
public:
MyIterator firstValue() { // return iterator over map values (int) }
}
class ContainerVector : public AbstractContainer {
private:
vector<int>;
public:
MyIterator firstValue() { // return iterator over vector values (int) }
}
In ContainerMap I can subclass map<K, int>::iterator to iterate over the map values.
But how can I define a generic iterator MyIterator, independent of the data structure, in such a way that given a pointer of type AbstractContainer I can iterate over the values ignoring the actual structure storing the data? And besides that, is this a good practice?
Edit
This question is a simplification of the problem. In my project one of the subclasses store my objects in memory (in a std::map) while the other retrieves the objects from an external database. I am trying to create a common interface to access the collection of objects, that is independent of the data source because the operations (search, insertion and deletion) would be exactly the same.

Well, no, it's not good practice.
The reason that there is more than one container type (for example, in the STL) is that there is no single container that is optimised for everything. So, one container type might be better suited to a use case where elements are inserted into a container once and it is iterated over multiple times, and another container might be better suited to code that needs to repeatedly add and remove elements from the middle.
The reason STL containers each specify their own iterators is that iterating over each container works in different ways. An iterator suited to working with a vector will - at best - be inefficient on a list and - at worst - will not work correctly.
That said, as in the STL, there is nothing stopping two different containers using the same name for their iterators. So Container_X and Container_y can both have an iterator named Iterator, but Container_X::Iterator does not need to work the same way as Container_Y::Iterator.
You're not the first person who wants code that is container agnostic (although you've worded it effectively as "agnostic to the iterator"). And you won't be the last. Unless some great mind manages to specify a container type with all operations optimal for all possible use cases (in contrast with the current state of play which is that each container type is optimal for some use cases but poor for others) container agnostic code is a futile goal. An iterator that can work across all containers will probably be maximally inefficient, for numerous measures, for one or more operations on most, if not all, of the different container types.

What you would want to do is create a separate iterator class that inherits from the C++ standard iterator class. Then, you would need to implement all the standard iterator functions within your iterator class (i.e dereference, ++, ==, !=, etc.).
Within your data structures you would want to have a function that will return the successor node/value from any point within the structure - this function will be called by the iterator's overloaded ++ operator in order to move to the next node/value in the data structure, in the order you want. For example, for a vector, given an index, you'd want your successor method to return the index that follows the given index in the vector.
From what I understand, though, you want your iterator to be generic so that you could use the same iterator class for more than one data structure. This might be possible through the use of templates and checks within your iterator class; however, it will probably not be a very secure implementation - not recommended.
Is it bad practice to subclass a standard library data structure and do what you're trying to do here? In the real world it would probably be considered bad practice, yes. But for experimentation purposes or a personal project, I'm sure it would be a good learning experience!

Related

why is begin() needed in std::vector erase?

Why do we have to write v.erase(v.begin(), v.begin()+3) ?
Why isn't it defined as erase(int, int) so you can write v.erase(0,2) and the implementation takes care of the begin()s?
The interface container.erase(iterator, iterator) is more general and works for containers that don't have indexing, like std::list. This is an advantage if you write templates and don't really know exactly which container the code is to work on.
The original design aimed at being as general as possible, and iterators are more general than indexes. The designers could have added extra index-based overloads for vector, but decided not to.
In STL, iterators are the only entity that provides general access to STL containers.
The array data structure can be accessed via pointers and indexes. iterators are generalization of these indexes/pointers.
Linked list can be accessed with moving pointers (a la ptr = ptr->next). iterators are generalization to these.
Trees and HashTables need special iterator class which encapsulates the logic of iterating these data structures.
As you can see, iterators are the general type which allows you to do common operations (such as iteration, deletion etc.) on data structures, regardless their underlying implementation.
This way, you can refactor your code to use std::list and container.erase(it0, it1) still works without modifying the code.

When creating my own data structure, should I use iterators or indices to provide access from the outside?

Suppose I'm writing a project in a modern version of C++ (say 11 or 14) and use STL in that project. At a certain moment, I need to program a specific data structure that can be built using STL containers. The DS is encapsulated in a class (am I right that encapsulating the DS in a class is the only correct way to code it in C++?), thus I need to provide some sort of interface to provide read and/or write access to the data. Which leads us to the question:
Should I use (1a) iterators or (1b) simple "indices" (i.e. numbers of a certain type) for that? The DS that I'm working on right now is pretty much linear, but then when the elements are removed, of course simple integer indices are going to get invalidated. That's about the only argument against this approach that I can imagine.
Which approach is more idiomatic? What are the objective technical arguments for and against each one?
Also, when I choose to use iterators for my custom DS, should I (2a) public-ly typedef the iterators of the container that is used internally or (2b) create my own iterator from scratch? In the open libraries such as Boost, I've seen custom iterators being written from scratch. On the other hand, I feel I'm not able to write a proper iterator yet (i.e. one that is as detailed and complex as the ones in STL and/or Boost).
Edit as per #πάντα ῥεῖ request:
I've asked myself this question with a few DS in a few projects while studying at the Uni, but here's the last occurrence that made me come here and ask.
The DS is meant to represent a triangle array, or vertex array, or whatever one might call it. Point is, there are two arrays or lists, one storing the vertex coordinates, and another one storing triplets of indices from the first array, thus representing triangles. (This has been coded a gazillion times already, yet I want to write it on my own, once, for the purpose of learning.) Obviously, the two arrays should stay in sync, hence the encapsulation. The set of operations is meant to include adding (maybe also removing) a vertex, adding and removing a triangle (a vertex triplet) using the vertex data from the same array. How I see it is that the client adds vertices, writes down the indices/iterators, and then issues a call to add a triangle based on those indices/iterators, which in turn returns another index/iterator to the resulting triangle.
I don't see why you couldn't get both, if this makes sense for your container.
std::vector has iterators and the at/operator[] methods to provide access with indexes.
The API of your container depends on the operations you want to make available to your clients.
Is the container iterable, i.e. is it possible to iterate over each elements? Then, you should provide an iterator.
Does it make sense to randomly access elements in your container, knowing their address? Then you can also provide the at(size_t)/operator[size_t] methods.
Does it make sense to randomly access elements in your container,
knowing a special "key"? The you should probably provide the at(key_type)/operator[key_type] methods.
As for your question regarding custom iterators or reuse of existing iterators:
If your container is basically a wrapper that adds some insertion/removal logic to an existing container, I think it is fine to publicly typedef the existing iterator, as a custom iterator may miss some features of the the existing iterator, may contain bugs, and will not add any significant feature over the existing iterator.
On the other hand, if you iterate in a non-standard fashion (for instance, I implemented once a recursive_unordered_map that accepted a parent recursive_unordered_map at construction and would iterate both on its own unordered_map and on its parent's (and its parent's parent's...). I had to implement a custom iterator for this.
Which approach is more idiomatic?
Using iterators is definitely the way to go. Functions in <algorithm> don't work with indices. They work with iterators. If you want your container to be enabled for use by the functions in <algorithm>, using iterators is the only way to go.
In general, it is recommended that the class offers its own iterator. Under the hood, it could be an index or a STL iterator (preferred). But, as long as external clients and public APIs are concerned, they only deal with the iterator offered by the class.
Example 1
class Dictionary {
private:
typedef std::unordered_map<string, string> DictType;
public:
typedef DictType::iterator DictionaryIterator;
};
Example 2
class Sequence {
private:
typedef std::vector<string> SeqType;
public:
struct SeqIterator {
size_t index;
SeqIterator operator++();
string operator*();
};
};
If the clients are operating solely on SeqIterator, then the above can later be modified to
class Sequence {
private:
typedef std::deque<string> SeqType;
public:
typedef SeqType::iterator SeqIterator;
};
without the clients getting affected.

What c++11 paradigm should I use to minimize memory-usage and minimize copying?

PROBLEM
I have an abstract interface Series and a concrete class Primary_Series which satisfies the interface by storing a large std::vector<> of values.
I also have another concrete class Derived_Series which is essentially a transform of the Primary_Series (eg some large Primary_Series multiplied by 3), which I want to be space-efficient, so I do not want to store the entire derived series as a member.
template<typename T>
struct Series
{
virtual std::vector<T> const& ref() const = 0;
};
template<typename T>
class Primary_Series : Series<T>
{
std::vector<T> m_data;
public:
virtual std::vector<T> const& ref() const override { return m_data; }
}
template<typename T>
class Derived_Series : Series<T>
{
// how to implement ref() ?
}
QUESTION
How should I change this interface/pure-virtual method?
I don't want to return that vector by value because it would introduce unnecessary copying for Primary_Series, but in the Derived_Series case, I definitely need to create some kind of temporary vector. But then I am faced with the issue of how do I make that vector go away once the caller is done with it.
It would be nice if ref() would return a reference to a temporary that goes away as the reference goes away.
Does this mean I should use some kind of std::weak_ptr<>? Would this fit with how Primary_Series works?
What is the best approach to satisfy the "minimize memory usage" and "minimize copying" requirements, including making the Derived_Series temporary go away once the caller is done?
Well the interface design as it is poses a bit of a problem, because C++ doesn't really do lazy.
Now, since Derived_Series is supposed to be a lazily-evaluated (because you want to be space-efficient) transform of the original Primary_Series, you cannot return a reference of a full, fat vector. (Because that would require you to construct it first.)
So we have to change the interface and the way the _Series share data. Use std::shared_ptr<std::vector<>> to share the data between the Primary_Series and Derived_Series, so that Primary_Series going out of scope cannot invalidate data for your transform.
Then you can change your interface to be more "vector-like". That is, implement some (or all) of the usual data-accessing functions (operator[], at()...) and/or custom iterators, that return transformed values from the original series. These will let you hide some of the implementation details (laziness of the transform, sharing of data...) and still be able to return transformed values with maximal efficiency and let people use your class as a "vector-like", so you don't have to change much of your design. (~Any algo that uses vector will be able to use your class after being made aware of it.)
I've also sketched out a very basic example of what I mean.
(Note: If you have a multithreaded design and mutable Primary_Series, you will have to think a bit about where and what you need synchronized.)
---edit---
After mulling it over a bit more, I also have to note that the implementation for Derived_Series will be kinda painful anyway. It's methods will have to return by value and its iterators will basically be input iterators masquerading as higher class of iterators, because return by reference for lazily evaluated values doesn't really work, or it will have to fill in it's own data structure, as the positions for the original series is evaluated, which will bring with it completely different set of tradeoffs.
One solution is to use a std::shared_ptr<vector<T> > to store the vector in your base class, and use that to return the value of the vector. The base class just returns its member value, and the derived class creates a new vector and returns that via a shared_ptr. Then when the caller doesn't need the returned value any more for the derived class, it will be automatically destroyed.
Alternatively, you can design your class to mimic the interface of an std::vector<T>, but design the base class so it returns the transformed values instead of the regular values. That way, no return is ever necessary. If you don't want to write methods for all of the functions a std::vector<T> has, you could just make some sort of transforming iterator that can iterate over and transform a std::vector<T>. Then you don't even have to have a complicated class hierarchy.
One way would be to define your own iterator and make your vector<T> private. Basically, you will have pure virtual accessors to begin() and end(). And the Derived_Series will just wrap the iterator of the Primary_Series and transform values on the fly.

Should there be an Iterator within another iterator for hierarchical data?

I want to maintain an iterator for my list data but list data has another list within and I want to maintain iterator to that as well. Should I maintain an iterator within iterator? Should the two iterators be seperate?
// iterator interface
class Iterator
{
public:
boolean hasNext() = 0;
Object getCurrnetItem() = 0;
Object next() = 0;
boolean remove() = 0;
}
// this iterator will iterator the following list
std::vector<MY_SCRUCT> mList;
now MY_STRUCT has another std::vector in it to which I also need iterator. The following represent it as sample code:
struct MY_SCRUCT
{
int numOfObjects;
std::vector<int> data; // i need iterator for this one too!
}
I need to maintain iterator to both these lists so my application can at any time know what are the current selected items.
My question again is that should these iterators be separate on should one stay inside the other to correspond to that data structure?
I would say keep another one inside rather than separate. Seems like good coding practice to keep relevant code togather. Also, rather than create a new one, you can create and instance of the iterator object from within itself. That way when you're doing OOP, you iterator class would have a similar structure to you struct class (keeps your code more managable and legible).
Do all users that want to iterate over the outer type also want to iterate over the inner sequence? If the answer is not, then the clear answer is keep them separate.
Even if all use cases want to iterate over both dimensions, the question still stands in a different form. Do you want to impose that all iterations must go over both dimensions completely, does this make sense?
I personally would avoid it. You iterate over one dimensions and get Object, for each object if the user wants she can also iterate over the inner ints. Binding the orthogonal concepts into a single iteration is not going to help you in any way and is probably going to preclude some use cases. At the same time it will make it harder to maintain. If they are separate and you add one extra subelement, you need to provide iteration over that new subelement, but you need not touch all of the existing code that does not depend on the new feature.

Implementing std::list item read/write events

I'm new to c++ but have set my mind on a specific task that needs me to enable adding a specific chunk of code to be execute whenever any list item is attempted to be changed or read.
The resulting list should behave and look as much as as possible to std::list except for this small exception that would enable me to execute a specific tasks whenever this list is about to be read/written to.
From what i have found out so far, all i could think of is deriving a class from list::iterator and overloading it's operator* and operator= to implement these specific tasks.
Then i should derive a class from std::list and make it use my new iterator type by overloading begin() and end() methods. Or is there a better way of making it use a custom iterator?
That would handle the iterator access but I can see lists can even return pointers to it's members. I guess there is nothing i can do about them and will have to remove this feature from my new list class.
I would appreciate your oppinion on this subject.
Deriving from std::list is almost certainly not the answer. The collections in stl are simply not meant to be derived from and doing so will cause you problems down the road.
The classic example of why is the destructor problem. The destructors in stl collections are not virtual. This will break any logic you place in your derived class destructor if an object is deleted via a reference to the std::list. For example
std::list* pList = new YourNewListClass();
delet pList; // runs std::list::~list()
You'd also need to override much more than the methods on the iterator. It would require changing every method which can possibly mutate the collection.
A more stable approach would be to implement your own std::list style class which follows the standard stl container behavior. You could then include use this list in the places you wanted events without running into the problems with deriving from std::list.
Look at the way that std::deque implements it's functionality as an adaptor of another standard collection. This is the way to go -- use composition not inheritance and wrap the underlying collection to provide your new facilities. For bonus points template your implementation on the underlying collection. For many uses a std::vector will outperform a std::list and your additions should be able to work equally well with whichever of these the user chooses.
First things first..NEVER derive a class from STL containers as they are not meant to be derived. For starters their destructor is not virtual.
The easiest way would be contain a std::list in your own class and providing list like interfaces. In these list like interfaces you can provide any additional tasks you want to perform before/updating the list.
Also, take a look at this design pattern: Decorator
Take a look at boost::transform_iterator<>. It seems to be close to what you're looking for. It calls a functor whenever the operator*() function of the transform_iterator<> is called. The intended use is to transform the object the iterators points to, but there's nothing that says the functor can't do something else and simply return the original value of the pointed to object.
Even if it's not quite what you want, it will probably provide ideas to how you might approach your problem.