why is begin() needed in std::vector erase? - c++

Why do we have to write v.erase(v.begin(), v.begin()+3) ?
Why isn't it defined as erase(int, int) so you can write v.erase(0,2) and the implementation takes care of the begin()s?

The interface container.erase(iterator, iterator) is more general and works for containers that don't have indexing, like std::list. This is an advantage if you write templates and don't really know exactly which container the code is to work on.
The original design aimed at being as general as possible, and iterators are more general than indexes. The designers could have added extra index-based overloads for vector, but decided not to.

In STL, iterators are the only entity that provides general access to STL containers.
The array data structure can be accessed via pointers and indexes. iterators are generalization of these indexes/pointers.
Linked list can be accessed with moving pointers (a la ptr = ptr->next). iterators are generalization to these.
Trees and HashTables need special iterator class which encapsulates the logic of iterating these data structures.
As you can see, iterators are the general type which allows you to do common operations (such as iteration, deletion etc.) on data structures, regardless their underlying implementation.
This way, you can refactor your code to use std::list and container.erase(it0, it1) still works without modifying the code.

Related

When creating my own data structure, should I use iterators or indices to provide access from the outside?

Suppose I'm writing a project in a modern version of C++ (say 11 or 14) and use STL in that project. At a certain moment, I need to program a specific data structure that can be built using STL containers. The DS is encapsulated in a class (am I right that encapsulating the DS in a class is the only correct way to code it in C++?), thus I need to provide some sort of interface to provide read and/or write access to the data. Which leads us to the question:
Should I use (1a) iterators or (1b) simple "indices" (i.e. numbers of a certain type) for that? The DS that I'm working on right now is pretty much linear, but then when the elements are removed, of course simple integer indices are going to get invalidated. That's about the only argument against this approach that I can imagine.
Which approach is more idiomatic? What are the objective technical arguments for and against each one?
Also, when I choose to use iterators for my custom DS, should I (2a) public-ly typedef the iterators of the container that is used internally or (2b) create my own iterator from scratch? In the open libraries such as Boost, I've seen custom iterators being written from scratch. On the other hand, I feel I'm not able to write a proper iterator yet (i.e. one that is as detailed and complex as the ones in STL and/or Boost).
Edit as per #πάντα ῥεῖ request:
I've asked myself this question with a few DS in a few projects while studying at the Uni, but here's the last occurrence that made me come here and ask.
The DS is meant to represent a triangle array, or vertex array, or whatever one might call it. Point is, there are two arrays or lists, one storing the vertex coordinates, and another one storing triplets of indices from the first array, thus representing triangles. (This has been coded a gazillion times already, yet I want to write it on my own, once, for the purpose of learning.) Obviously, the two arrays should stay in sync, hence the encapsulation. The set of operations is meant to include adding (maybe also removing) a vertex, adding and removing a triangle (a vertex triplet) using the vertex data from the same array. How I see it is that the client adds vertices, writes down the indices/iterators, and then issues a call to add a triangle based on those indices/iterators, which in turn returns another index/iterator to the resulting triangle.
I don't see why you couldn't get both, if this makes sense for your container.
std::vector has iterators and the at/operator[] methods to provide access with indexes.
The API of your container depends on the operations you want to make available to your clients.
Is the container iterable, i.e. is it possible to iterate over each elements? Then, you should provide an iterator.
Does it make sense to randomly access elements in your container, knowing their address? Then you can also provide the at(size_t)/operator[size_t] methods.
Does it make sense to randomly access elements in your container,
knowing a special "key"? The you should probably provide the at(key_type)/operator[key_type] methods.
As for your question regarding custom iterators or reuse of existing iterators:
If your container is basically a wrapper that adds some insertion/removal logic to an existing container, I think it is fine to publicly typedef the existing iterator, as a custom iterator may miss some features of the the existing iterator, may contain bugs, and will not add any significant feature over the existing iterator.
On the other hand, if you iterate in a non-standard fashion (for instance, I implemented once a recursive_unordered_map that accepted a parent recursive_unordered_map at construction and would iterate both on its own unordered_map and on its parent's (and its parent's parent's...). I had to implement a custom iterator for this.
Which approach is more idiomatic?
Using iterators is definitely the way to go. Functions in <algorithm> don't work with indices. They work with iterators. If you want your container to be enabled for use by the functions in <algorithm>, using iterators is the only way to go.
In general, it is recommended that the class offers its own iterator. Under the hood, it could be an index or a STL iterator (preferred). But, as long as external clients and public APIs are concerned, they only deal with the iterator offered by the class.
Example 1
class Dictionary {
private:
typedef std::unordered_map<string, string> DictType;
public:
typedef DictType::iterator DictionaryIterator;
};
Example 2
class Sequence {
private:
typedef std::vector<string> SeqType;
public:
struct SeqIterator {
size_t index;
SeqIterator operator++();
string operator*();
};
};
If the clients are operating solely on SeqIterator, then the above can later be modified to
class Sequence {
private:
typedef std::deque<string> SeqType;
public:
typedef SeqType::iterator SeqIterator;
};
without the clients getting affected.

Custom iterator for multiple containers in C++

I have a pure abstract class and two derived classes that I use to store the same kind of data, let's say int, but in different data structures, let's say a map and a vector.
class AbstractContainer {
public:
virtual MyIterator firstValue() = 0;
}
class ContainerMap : public AbstractContainer {
private:
map<K, int>;
public:
MyIterator firstValue() { // return iterator over map values (int) }
}
class ContainerVector : public AbstractContainer {
private:
vector<int>;
public:
MyIterator firstValue() { // return iterator over vector values (int) }
}
In ContainerMap I can subclass map<K, int>::iterator to iterate over the map values.
But how can I define a generic iterator MyIterator, independent of the data structure, in such a way that given a pointer of type AbstractContainer I can iterate over the values ignoring the actual structure storing the data? And besides that, is this a good practice?
Edit
This question is a simplification of the problem. In my project one of the subclasses store my objects in memory (in a std::map) while the other retrieves the objects from an external database. I am trying to create a common interface to access the collection of objects, that is independent of the data source because the operations (search, insertion and deletion) would be exactly the same.
Well, no, it's not good practice.
The reason that there is more than one container type (for example, in the STL) is that there is no single container that is optimised for everything. So, one container type might be better suited to a use case where elements are inserted into a container once and it is iterated over multiple times, and another container might be better suited to code that needs to repeatedly add and remove elements from the middle.
The reason STL containers each specify their own iterators is that iterating over each container works in different ways. An iterator suited to working with a vector will - at best - be inefficient on a list and - at worst - will not work correctly.
That said, as in the STL, there is nothing stopping two different containers using the same name for their iterators. So Container_X and Container_y can both have an iterator named Iterator, but Container_X::Iterator does not need to work the same way as Container_Y::Iterator.
You're not the first person who wants code that is container agnostic (although you've worded it effectively as "agnostic to the iterator"). And you won't be the last. Unless some great mind manages to specify a container type with all operations optimal for all possible use cases (in contrast with the current state of play which is that each container type is optimal for some use cases but poor for others) container agnostic code is a futile goal. An iterator that can work across all containers will probably be maximally inefficient, for numerous measures, for one or more operations on most, if not all, of the different container types.
What you would want to do is create a separate iterator class that inherits from the C++ standard iterator class. Then, you would need to implement all the standard iterator functions within your iterator class (i.e dereference, ++, ==, !=, etc.).
Within your data structures you would want to have a function that will return the successor node/value from any point within the structure - this function will be called by the iterator's overloaded ++ operator in order to move to the next node/value in the data structure, in the order you want. For example, for a vector, given an index, you'd want your successor method to return the index that follows the given index in the vector.
From what I understand, though, you want your iterator to be generic so that you could use the same iterator class for more than one data structure. This might be possible through the use of templates and checks within your iterator class; however, it will probably not be a very secure implementation - not recommended.
Is it bad practice to subclass a standard library data structure and do what you're trying to do here? In the real world it would probably be considered bad practice, yes. But for experimentation purposes or a personal project, I'm sure it would be a good learning experience!

C++ create a char* iterator

I like to write container agnostic code that uses std methods like std::distance() or std::advance() in my code. This is useful for deserialization where I can pass in buffer objects of different types (network stream, byte stream, ...).
How can I convert char* or uint8_t* pointers to an iterator? Copying the data to a buffer is not an option.
One option I have in mind is to use a custom allocator with std::string but I'd prefer a more ready made solution if available.Any suggestions?
There are several types of iterators, specified by what properties they have (functionality they support) - there is a nice overview here http://www.cplusplus.com/reference/iterator/
Random access iterators require to implement all the iterator functionality seen in that table.
Raw pointers do in fact support all the operations and are therefore random access operators iterators and can be used for all STL algorithms and containers. Also discussed here Can raw pointers be used instead of iterators with STL algorithms for containers with linear storage?.
Although not necessary, it might still be useful to implement an iterator wrapper for your pointers - this is also discussed in the answers to the question above.
Nevermind. Those pointers work as iterators anyway because they implement the basic functionality.

Can I have a C++ map where multiple keys reference the value without using pointers?

From a C background I find myself falling back into C habits where there is generally a better way. In this case I can't think of a way to do this without pointers.
I would like
struct foo {
int i;
int j;
};
mymap['a'] = foo
mymap['b'] = bar
As long as only one key references a value mymap.find will return a reference so I can modify the value, but if I do this:
mymap['c'] = mymap.find('a') // problematic because foo is copied right?
The goal is to be able to find 'a' or 'c' modify foo and then the next find of 'a' or 'c' will show the updated result.
No, you will need to use pointers for this. Each entry in the map maintains a copy of the value assigned, which means that you cannot have two keys referring to the same element. Now if you store pointers to the element, then two keys will refer to two separate pointers that will refer to the exact same in memory element.
For some implementation details, std::map is implemented as a balanced tree where in each node contains a std::pair<const Key,Value> object (and extra information for the tree structure). When you do m[ key ] the node containing the key is looked up or a new node is created in the tree and the reference to the Value subobject of the pair is returned.
I would use std::shared_ptr here. You have an example of shared ownership, and shared_ptr is made for that. While pointers tend to be overused, it is nothing wrong with using them when necessary.
Boost.Intrusive
Boost.Intrusive is a library presenting some intrusive containers to
the world of C++. Intrusive containers are special containers that
offer better performance and exception safety guarantees than
non-intrusive containers (like STL containers).
The performance benefits of intrusive containers makes them ideal as a
building block to efficiently construct complex containers like
multi-index containers or to design high performance code like memory
allocation algorithms.
While intrusive containers were and are widely used in C, they became
more and more forgotten in C++ due to the presence of the standard
containers which don't support intrusive techniques.Boost.Intrusive
not only reintroduces this technique to C++, but also encapsulates the
implementation in STL-like interfaces. Hence anyone familiar with
standard containers can easily use Boost.Intrusive.

Concurrent factory/flyweight with TBB

I have a flyweight pattern working in serial where the factory uses std::map to store and provide access to the created objects. The factory returns an iterator that points to the object in the map. The objects in the factory are constants, so they will not be updated once inserted, unless they are erased.
I would like to make the factory concurrent using tbb::concurrent_hash_map, but I am unsure what the return should be. I could use an iterator (should it be const_iterator?), but the documentation says that all iterators are invalidated when something does a find or insert in the concurrent_hash_map. So I could use a const_accessor since only read-only access is needed, but then this is different from the serial implementation (iterator vs accessor).
Which one is better to use? Should consistency in types (ie. both iterators) be important? Both serial and threaded compile-time options need to be there.
If you do not erase elements simultaneously with other threads accessing the map, you may use tbb::concurrent_unordered_map instead. This is also a hash-based associative container, but with simpler and more STL-like API. It does not invalidate iterators by insert and find, but as a tradeoff, it does not allow concurrent removal of elements.
If you do need to remove elements concurrently, the only choice with TBB is to use tbb::concurrent_hash_map with accessors.
I also suggest you to discuss your use case at the TBB forum.