Why are STL algorithms called differently for different classes?

Why are STL algorithms called differently for different classes? - c++

I am currently looking at STL libraries and am wondering why for a vector class vector<string> names; I have to call remove(); as follows:
names.erase(remove(names.begin(), names.end(), "Simon"), names.end());
While when using a list class list<string> names; I can call the function as follows:
remove("Simon");
I also notice the same for reverse(); for vector<string> names; it is called as follows:
reverse(names.begin(), names.end());
While for list<string> names; it is called as follows:
names.reverse();
Is it more proper to always call the way the vector would? why is this? I'm very new to C++ so I'm keen to know the best way to do things.

Basically, there are some special cases that have to do with the nature of specific containers.
In general the free functions std::remove, std::remove_if, and std::reverse declared in the <algorithm> header will work on vectors, lists, deques, and arrays by copying and moving elements. (They will not, of course, work on sets or maps, since for those you are not free to rearrange elements willy-nilly.) Note that std::remove does not delete elements from containers.
In general the member function erase of each container type is used to delete elements from that container. (Note that std::array does not have erase because its size is fixed.)
There are special cases:
std::list provides reverse, as a member because only the member function can guarantee that it does not invalidate any iterators; the generic std::reverse cannot.
The same goes for remove and remove_if though the names are misleading because unlike the free functions, the members do delete elements from the list.
There is also a member sort for std::list because the generic std::sort only works on random access iterators.
For sets and maps we should use their member lower_bound, upper_bound, equal_range, and count instead of the generic versions, since they know how to walk down the tree and get the result in logarithmic time whereas the free functions will use linear time.
Generally, the principle appears to be: the standard library containers support uniform interfaces as far as possible, but also provide additional specialized functions in order to provide functionality that depends on their internals.

Related

why is begin() needed in std::vector erase?

Why do we have to write v.erase(v.begin(), v.begin()+3) ?
Why isn't it defined as erase(int, int) so you can write v.erase(0,2) and the implementation takes care of the begin()s?

The interface container.erase(iterator, iterator) is more general and works for containers that don't have indexing, like std::list. This is an advantage if you write templates and don't really know exactly which container the code is to work on.
The original design aimed at being as general as possible, and iterators are more general than indexes. The designers could have added extra index-based overloads for vector, but decided not to.

In STL, iterators are the only entity that provides general access to STL containers.
The array data structure can be accessed via pointers and indexes. iterators are generalization of these indexes/pointers.
Linked list can be accessed with moving pointers (a la ptr = ptr->next). iterators are generalization to these.
Trees and HashTables need special iterator class which encapsulates the logic of iterating these data structures.
As you can see, iterators are the general type which allows you to do common operations (such as iteration, deletion etc.) on data structures, regardless their underlying implementation.
This way, you can refactor your code to use std::list and container.erase(it0, it1) still works without modifying the code.

When creating my own data structure, should I use iterators or indices to provide access from the outside?

Suppose I'm writing a project in a modern version of C++ (say 11 or 14) and use STL in that project. At a certain moment, I need to program a specific data structure that can be built using STL containers. The DS is encapsulated in a class (am I right that encapsulating the DS in a class is the only correct way to code it in C++?), thus I need to provide some sort of interface to provide read and/or write access to the data. Which leads us to the question:
Should I use (1a) iterators or (1b) simple "indices" (i.e. numbers of a certain type) for that? The DS that I'm working on right now is pretty much linear, but then when the elements are removed, of course simple integer indices are going to get invalidated. That's about the only argument against this approach that I can imagine.
Which approach is more idiomatic? What are the objective technical arguments for and against each one?
Also, when I choose to use iterators for my custom DS, should I (2a) public-ly typedef the iterators of the container that is used internally or (2b) create my own iterator from scratch? In the open libraries such as Boost, I've seen custom iterators being written from scratch. On the other hand, I feel I'm not able to write a proper iterator yet (i.e. one that is as detailed and complex as the ones in STL and/or Boost).
Edit as per #πάντα ῥεῖ request:
I've asked myself this question with a few DS in a few projects while studying at the Uni, but here's the last occurrence that made me come here and ask.
The DS is meant to represent a triangle array, or vertex array, or whatever one might call it. Point is, there are two arrays or lists, one storing the vertex coordinates, and another one storing triplets of indices from the first array, thus representing triangles. (This has been coded a gazillion times already, yet I want to write it on my own, once, for the purpose of learning.) Obviously, the two arrays should stay in sync, hence the encapsulation. The set of operations is meant to include adding (maybe also removing) a vertex, adding and removing a triangle (a vertex triplet) using the vertex data from the same array. How I see it is that the client adds vertices, writes down the indices/iterators, and then issues a call to add a triangle based on those indices/iterators, which in turn returns another index/iterator to the resulting triangle.

I don't see why you couldn't get both, if this makes sense for your container.
std::vector has iterators and the at/operator[] methods to provide access with indexes.
The API of your container depends on the operations you want to make available to your clients.
Is the container iterable, i.e. is it possible to iterate over each elements? Then, you should provide an iterator.
Does it make sense to randomly access elements in your container, knowing their address? Then you can also provide the at(size_t)/operator[size_t] methods.
Does it make sense to randomly access elements in your container,
knowing a special "key"? The you should probably provide the at(key_type)/operator[key_type] methods.
As for your question regarding custom iterators or reuse of existing iterators:
If your container is basically a wrapper that adds some insertion/removal logic to an existing container, I think it is fine to publicly typedef the existing iterator, as a custom iterator may miss some features of the the existing iterator, may contain bugs, and will not add any significant feature over the existing iterator.
On the other hand, if you iterate in a non-standard fashion (for instance, I implemented once a recursive_unordered_map that accepted a parent recursive_unordered_map at construction and would iterate both on its own unordered_map and on its parent's (and its parent's parent's...). I had to implement a custom iterator for this.

Which approach is more idiomatic?
Using iterators is definitely the way to go. Functions in <algorithm> don't work with indices. They work with iterators. If you want your container to be enabled for use by the functions in <algorithm>, using iterators is the only way to go.

In general, it is recommended that the class offers its own iterator. Under the hood, it could be an index or a STL iterator (preferred). But, as long as external clients and public APIs are concerned, they only deal with the iterator offered by the class.
Example 1
class Dictionary {
private:
typedef std::unordered_map<string, string> DictType;
public:
typedef DictType::iterator DictionaryIterator;
};
Example 2
class Sequence {
private:
typedef std::vector<string> SeqType;
public:
struct SeqIterator {
size_t index;
SeqIterator operator++();
string operator*();
};
};
If the clients are operating solely on SeqIterator, then the above can later be modified to
class Sequence {
private:
typedef std::deque<string> SeqType;
public:
typedef SeqType::iterator SeqIterator;
};
without the clients getting affected.

Possible swap map elements without copy?

I have a map<int,map<int,string>> themap
I would like to swap elements themap[1] and themap[2]. But the inside maps map<int,string> are very big so I don't want them copied. Is there way to do this or do I have to change themap to use pointers.

You can try std::map::swap for the outer map:
void swap( map& other );
Exchanges the contents of the container with those of other. Does not invoke any move, copy, or swap operations on individual elements.

themap[1].swap(themap[2]);
This won't copy, or even move, any elements in the maps. It will likely just swap ownership of the root node, just a few pointer assignments.

You can use std::swap since it has a specialization for std::map (and many other containers and library types).
std::swap(themap[1], themap[2]);
It does call std::map::swap but I think it's useful to get in the habit of using std::swap and trusting the standard committee and library implementers to do the right thing.

Custom iterator for multiple containers in C++

I have a pure abstract class and two derived classes that I use to store the same kind of data, let's say int, but in different data structures, let's say a map and a vector.
class AbstractContainer {
public:
virtual MyIterator firstValue() = 0;
}
class ContainerMap : public AbstractContainer {
private:
map<K, int>;
public:
MyIterator firstValue() { // return iterator over map values (int) }
}
class ContainerVector : public AbstractContainer {
private:
vector<int>;
public:
MyIterator firstValue() { // return iterator over vector values (int) }
}
In ContainerMap I can subclass map<K, int>::iterator to iterate over the map values.
But how can I define a generic iterator MyIterator, independent of the data structure, in such a way that given a pointer of type AbstractContainer I can iterate over the values ignoring the actual structure storing the data? And besides that, is this a good practice?
Edit
This question is a simplification of the problem. In my project one of the subclasses store my objects in memory (in a std::map) while the other retrieves the objects from an external database. I am trying to create a common interface to access the collection of objects, that is independent of the data source because the operations (search, insertion and deletion) would be exactly the same.

Well, no, it's not good practice.
The reason that there is more than one container type (for example, in the STL) is that there is no single container that is optimised for everything. So, one container type might be better suited to a use case where elements are inserted into a container once and it is iterated over multiple times, and another container might be better suited to code that needs to repeatedly add and remove elements from the middle.
The reason STL containers each specify their own iterators is that iterating over each container works in different ways. An iterator suited to working with a vector will - at best - be inefficient on a list and - at worst - will not work correctly.
That said, as in the STL, there is nothing stopping two different containers using the same name for their iterators. So Container_X and Container_y can both have an iterator named Iterator, but Container_X::Iterator does not need to work the same way as Container_Y::Iterator.
You're not the first person who wants code that is container agnostic (although you've worded it effectively as "agnostic to the iterator"). And you won't be the last. Unless some great mind manages to specify a container type with all operations optimal for all possible use cases (in contrast with the current state of play which is that each container type is optimal for some use cases but poor for others) container agnostic code is a futile goal. An iterator that can work across all containers will probably be maximally inefficient, for numerous measures, for one or more operations on most, if not all, of the different container types.

What you would want to do is create a separate iterator class that inherits from the C++ standard iterator class. Then, you would need to implement all the standard iterator functions within your iterator class (i.e dereference, ++, ==, !=, etc.).
Within your data structures you would want to have a function that will return the successor node/value from any point within the structure - this function will be called by the iterator's overloaded ++ operator in order to move to the next node/value in the data structure, in the order you want. For example, for a vector, given an index, you'd want your successor method to return the index that follows the given index in the vector.
From what I understand, though, you want your iterator to be generic so that you could use the same iterator class for more than one data structure. This might be possible through the use of templates and checks within your iterator class; however, it will probably not be a very secure implementation - not recommended.
Is it bad practice to subclass a standard library data structure and do what you're trying to do here? In the real world it would probably be considered bad practice, yes. But for experimentation purposes or a personal project, I'm sure it would be a good learning experience!

C++ create a char* iterator

I like to write container agnostic code that uses std methods like std::distance() or std::advance() in my code. This is useful for deserialization where I can pass in buffer objects of different types (network stream, byte stream, ...).
How can I convert char* or uint8_t* pointers to an iterator? Copying the data to a buffer is not an option.
One option I have in mind is to use a custom allocator with std::string but I'd prefer a more ready made solution if available.Any suggestions?

There are several types of iterators, specified by what properties they have (functionality they support) - there is a nice overview here http://www.cplusplus.com/reference/iterator/
Random access iterators require to implement all the iterator functionality seen in that table.
Raw pointers do in fact support all the operations and are therefore random access operators iterators and can be used for all STL algorithms and containers. Also discussed here Can raw pointers be used instead of iterators with STL algorithms for containers with linear storage?.
Although not necessary, it might still be useful to implement an iterator wrapper for your pointers - this is also discussed in the answers to the question above.

Nevermind. Those pointers work as iterators anyway because they implement the basic functionality.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js