What are Iterators, C++? - c++

What are Iterators in C++?

Iterators are a way of traversing a collection of objects. Typically, they allow you to access an STL (Standard Template Library) container sequentially in ways similar to accessing a classical C array with a pointer. To access an object through an iterator, you dereference it like a C pointer. To access the next object in a collection, you use the increment (++) operator. Some containers have multiple kinds of iterators that allow you to traverse the collection in different ways.

Though it initially seems fairly obvious, this is actually a rather deeper question than you may realize. Along with Paul McJones, Alexander Stepanov (designer of the original, for anybody who's not aware of that) recently released a book named Elements of Programming (aka EOP). The entirety of chapter six in that book is devoted specifically to iterators, and quite a bit of the rest of the book relates closely to iterators as well. Anybody who really wants to know iterators in full detail might consider reading this book.
Warning: EOP is not for the faint of heart. It's relatively short (~260 pages), but quite dense. Speaking from experience, the early going is a bit disconcerting. My initial reaction to the first chapter was more or less "well, this is so obvious it's hardly worth reading. I did start programming before last week, after all!"
Fortunately, I did look at the exercises, and tried to do a couple -- and even though I had thought of the subjects as obvious, the exercises demand rigorous proofs. It's a bit like being asked to prove (in a mathematical sense) that water is wet. You end up just about needing to read the chapter a couple of times just to get past your own preconceived notion that you already know the answers, so you can look at the real question -- what does "wet" really mean; what are the fundamental characteristics of "wetness"?

http://en.wikipedia.org/wiki/Iterator
Something that lets you go through everything in an array, one by one.
In c++, i think you're talking about "for_each" ... As far as I know, C++ doesn't actually have "foreach" unlike languages like C#. However, the standard template library has it.

From p. 80 of Accelerated C++:
An iterator is a value that
Identifies a container and an element in the container
Lets us examine the value stored in that element
Provides operations for moving between elements in the container
Restricts the available operations in ways that correspond to what the container can handle efficiently

They're a representation of a position within a sequence. On their own they're little more than curiosities, but when dereferenced they result in the value contained within the sequence at the position it represents.

If you come from higher-level languages like Java or Python, you may have noticed C++ doesn't have any built-in complex types but only primitives like int, double or char. Since C++ was designed to be extremely efficient, whenever you need to use any collection type that is used to hold other values, it makes sense to create a new custom class. In fact, this is the way how C++ combines lower-level control and higher-level abstractions.
The Standard Template Library provides a standartised collection of those classes that you can use to hold multiple values under a single entity. Primarily, they were created because raw C arrays are not flexible enough and the containers provide developers with smoother developer experience as they:
dynamically expand and shrink without any developer effort;
provide multiple useful interfaces with the data (size(), upper_bound(), etc);
simplify memory management with Resource Acquisition is Initialisation.
So far, so good. Standard containers like vector or list give developers greater flexibility without much performance loss. However, since they are custom classes defined with C++ semantics, they need to provide also a way to access the data they hold. One could argue a simple operator[] or next() method could do the trick, and indeed you can do this, however the STL took inspiration from C and created a way to be able to access container items not depending on the container object itself: iterators.
Under the hood, iterator is nothing but an object that wraps a pointer to a value with a few operator overloading. You can use iterators in a similar way how you use pointers to array:
you can pass iterators around and iterate through containers even if you don't have the access to the container objects themselves;
you can dereference and move iterator with incrementation.
This is the primary purpose of iterators: to serve as pointers to container items. Because different containers store different items in different ways (std::vector uses contiguous block of memory, std::list stores nodes elsewhere connected with pointers, std::map uses hashing key in associative array), iterators are useful to provide a common interface that will be implemented in every separate container. In fact, this is the very reason that allows containers like std::vector, std::array or std::map to be enumerated with range-based for loops:
std::vector<int> grades = {4, 5, 1, 8, 10};
for (int grade : grades) std::cout << grade << " ";
//-> 4 5 1 8 10
This for loop is just syntactic sugar for using iteration:
std::vector<int> grades = {4, 5, 1, 8, 10};
std::vector::iterator it = grades.begin();
for (; it != grades.end(); ++it) std::cout << grade << " ";
//The output is the same.
You may have noticed some common interfaces iterators use, precisely they:
overload the * operator to dereference an item;
overload the ++ operator to move the iterator one item forward;
overload the == and != operators to compare if they point to the same value;
are usually defined as a nested friend class (you access them with <container>::iterator, although you can entirely define iterator as a separate class.
Notice also that all containers that support containers should provide begin() method that returns the iterator to the first item, as well as end() to return an iterator to the item after the last one. The reason why it points to the location past the final one is because it is used to evaluate if the iterator exhausted all items: had end() pointed to the last item, the looping condition would be it <= grades.end(). Instead pointing to the next location after the container allows it evaluate it with a simple less check, which is the same reason why arrays begin with zero. Aside from them, there's also rbegin() and rend() functions that provide reversed iterator that goes from the end to the start and its ++ operator actually goes to the beginning.
Implementing a custom iterator
To make it completely clear, let's implement our own custom iterator for a wrapper around plain arrays.
template<typename T, unsgined int Capacity> class Array {
T data[Capacity];
int count;
friend class Iterator;
public:
Array(const std::initializer_list args) {
for (int step = 0; step < args.size(); step++)
data[step] = args[step;
}
int size() : count;
T& operator[](int index) {
if (index < 0 || index > capacity) throw
std::out_of_range("Index out of range.");
return data[index];
}
Iterator<T> begin() {
return &data; //Pointer to array yield the
//address to their first item.
}
Iterator<T> end() {
return &(data + Capacity);
}
};
template<typename T> class Iterator {
T* reference;
public:
Iterator(const Array<T>& array) {
reference = array.begin();
}
T* operator*() {
return reference;
}
Iterator<T> operator++() {
return ++reference; //This is array-specific implementation detail.
}
bool operator!=(const Iterator<T>& other) : *reference != *other;
};
int main() {
Array<int> array = {4, 5, 10, 12, 45, 100};
Iterator<int> it = array.begin();
while (it != array.end()) {
std::cout << *it;
++it;
}
return 0;
}
As you can see, dividing iterator as a separate class creates a need to specify its type separately as well, which is the reason why it's usually defined as a nested class in its container.
There is also <iterator> header that provides some useful facilities for standard iterators, such as:
next() function that increments the iterator;
prev() function that returns the iterator ` step before;
advance(int) that moves iterator n steps forward;
overloaded operators to compare iterators;
other iteration-related additions.
If you need to write a custom iterator for your own highly-specific container that isn't present in STL, you should remember that iterators are used as a median between containers and algorithms, and for your own container you should pick the proper type of iterator (input, output, forward, bidirectional or random access) to support a number of standard algorithms out of the hand.

Related

Overloading push_back() in vector to allow non-duplicate elements

Can we overload the push_back() method in std::vector to allow non-duplicate elements? I know std::set and std::unordered_set are supposed to avoid duplicate elements, but std::set sorts the elements and std::unordered_set stores the elements in no particular order. I need to retrieve the elements in the order they are inserted, while ensuring duplicate elements are not inserted.
Edit: There's a possible duplicate for this question here. The best solution to this duplicate proposes to have an auxiliary data structure and another custom method "add". This doesn't look good for me since(I'll put it in a separate documentation) the users inserting data in std::vector rarely refer to the documentation for any custom functions. If there's no efficient way though, this can be a last resort.
Many people advise against it, but it seems there's some kind of urban legend going around that doing so will cause the universe to undergo vacuum decay and reality as we know it will dissolve.
You can publicly inherit from std::vector. But you have to think about what you can do with that.
If you inherit from vector, it is highly recommended that you don't add any data members to it. This can cause object slicing (google "c++ object slicing".) You also need to keep in mind that vector is not using virtual functions. That means you cannot override member functions. You can only shadow them, so it's not guaranteed that it will always be your push_back() function that gets called. The original will get called if you pass an object of your class to something that takes a reference to a vector, for example.
So in the end, you'd need to add a push_back_unique() function instead. But that in turns means that can be served by a simple free function instead. So inheriting vector isn't needed. This of course means there's never a guarantee that the elements in the vector will be unique. Other code might use push_back() instead somewhere.
Inheriting vector makes sense if you want to add completely new convenience functions that don't impose or lift any restrictions that vector has. If you want something that looks like a vector but really isn't (because it has different behavior and/or restrictions), you should implement your own type that delegates the container functionality to vector by either inheriting privately from it, or by having it as a private data member, and then replicate the vector API through public wrapper functions.
But this is very tedious to implement. Usually, you don't really need all the API from vector. So I'd say just write a smaller class around vector that only provides the functionality you need. And that functionality sounds like it's going to be pretty much read-only, since allowing write access to the elements allows for setting an element to the same value as another, breaking the container's uniqueness. So you could do something like:
template<typename T>
class UniqueVector
{
public:
void push_back(T&& elem)
{
if (std::find(vec_.begin(), vec_.end(), elem) == vec_.end()) {
vec_.push_back(std::forward(elem));
}
}
const T& operator[](size_t index) const
{
return vec_[index];
}
auto begin() const
{
return vec_.cbegin();
}
auto end() const
{
return vec_.cend();
}
private:
std::vector<T> vec_;
};
If you still want to allow write access to individual elements, then you can provide non-const functions that check if the value that is passed is already in the vector. Like:
void assign_if_unique(size_t index, T&& value)
{
if (std::find(vec_.begin(), vec_.end(), value) == vec_.end()) {
vec_[index] = std::forward(value);
}
}
This is a minimal example. You should obviously add the functions you actually want. Like size(), empty(), and whatever else you need.
You should first define a free function1 to implement your feature:
template<class T>
std::vector<T>&
push_back_unique(std::vector<T>& dest, T const& src)
{ /* ... */ }
If you use this a lot, and if make sense regarding your program, you might want to define an operator to do so:
template<class T>
std::vector<T>& operator<<(std::vector<T>& dest, T const& src)
{ return push_back_unique(dest, src); }
This allows:
std::vector<int> data;
data << 5 << 8 << 13 << 5 << 21;
for (auto n : data) std::cout << n << " "; // prints 5 8 13 21
1) This is because inheriting from standard containers is often bad practice and brings pitfalls.

Optimize search in std::deque

I'm doing a program that has a different kind of objects and all of them are children of a virtual class. I'm doing this looking for the advantages of polymorphism that allow me to call from a manager class a certain method of all the objects without checking the specific kind of object it is.
The point is the different kind of objects need sometimes get a list of objects of a certain type.
In that moment my manager class loop thought all the objects and check the type of the object. It creates a list and return it like this:
std::list<std::shared_ptr<Object>> ObjectManager::GetObjectsOfType(std::string type)
{
std::list<std::shared_ptr<Object>> objectsOfType;
for (int i = 0; i < m_objects.size(); ++i)
{
if (m_objects[i]->GetType() == type)
{
objectsOfType.push_back(m_objects[i]);
}
}
return objectsOfType;
}
m_objects is a deque. I know iterate a data structure is normally expensive but I want to know if is possible to polish it a little bit because now this function takes a third of all the time used in the program.
My question is: is there any design pattern or fuction that I'm not taking into account in order to reduce the cost of this operation in my program?
In the code as given, there is just a single optimization that can be done locally:
for (auto const& obj : m_objects)
{
if (obj->GetType() == type)
{
objectsOfType.push_back(obj);
}
}
The rationale is that operator[] is generally not the most efficient way to access a deque. Having said that, I don't expect a major improvement. Your locality of reference is very poor: You're essentially looking at two dereferences (shared_ptr and string).
A logical approach would be to make m_objects a std::multimap keyed by type.
Some things you can do to speed up:
Store the type on the base class, this will remove a somewhat expensive virtual lookup.
If type is a string, etc. change to a
simpel type like an enum or int
A vector is more effiecient to
traverse than a deque
if staying with deque, use iterators or a range based for loop to avoid the random lookups (which are more expensive in deque)
Range based looks like this:
for (auto const& obj : m_objects)
{
if (obj->GetType() == type)
{
objectsOfType.push_back(obj);
}
}
Update: Also I would recommend against using a std::list (unless for some reason you have to) as it is not really performing well in many cases - again std::vector springs to the rescue !

Programming a simple object oriented graph in C++

I am really trying to be a better programmer, and to make more modular, organized code.
As an exercise, I was trying to make a very simple Graph class in C++ with STL. In the code below, my Node object does not compile because the commented line results in a reference to a reference in STL.
#include <set>
class KeyComparable
{
public:
int key;
};
bool operator <(const KeyComparable & lhs, const KeyComparable & rhs)
{
return lhs.key < rhs.key;
}
class Node : public KeyComparable
{
public:
// the following line prevents compilation
// std::set<Node &> adjacent;
};
I would like to store the edges in a set (by key) because it allows fast removal of edges by key. If I were to store list<Node*>, that would work fine, but it wouldn't allow fast deletion by key.
If I use std::set<Node>, changes made through an edge will only change the local copy (not actually the adjacent Node). If I use std::set<Node*>, I don't believe the < operator will work because it will operate on the pointers themselves, and not the memory they index.
I considered wrapping references or pointers in another class, possibly my KeyComparable class (according to the linked page, this is how boost handles it).
Alternatively, I could store the std::list<Node*> and a std::map<int, iterator>' of locations in thestd::list`. I'm not sure if the iterators will stay valid as I change the list.
Ages ago, everything here would just be pointers and I'd handle all the data structures manually. But I'd really like to stop programming C-style in every language I use, and actually become a good programmer.
What do you think is the best way to handle this problem? Thanks a lot.
As you have deduced, you can't store references in STL containers because one of the requirements of items stored is that they be assignable. It's same reason why you can't store arrays in STL containers. You also can't overload operators without at least one being a user-defined type, which makes it appear that you can't do custom comparisons if you store pointers in an STL class...
However, you can still use std::set with pointers if you give set a custom comparer functor:
struct NodePtrCompare {
bool operator()(const Node* left, const Node* right) const {
return left->key < right->key;
}
};
std::set<Node*, NodePtrCompare> adjacent;
And you still get fast removals by key like you want.

Hybrid vector/list container?

I'm in need of a container that has the properties of both a vector and a list. I need fast random access to elements within the container, but I also need to be able to remove elements in the middle of the container without moving the other elements. I also need to be able to iterate over all elements in the container, and see at a glance (without iteration) how many elements are in the container.
After some thought, I've figured out how I could create such a container, using a vector as the base container, and wrapping the actual stored data within a struct that also contained fields to record whether the element was valid, and pointers to the next/previous valid element in the vector. Combined with some overloading and such, it sounds like it should be fairly transparent and fulfill my requirements.
But before I actually work on creating yet another container, I'm curious if anyone knows of an existing library that implements this very thing? I'd rather use something that works than spend time debugging a custom implementation. I've looked through the Boost library (which I'm already using), but haven't found this in there.
If the order does not matter, I would just use a hash table mapping integers to pointers. std::tr1::unordered_map<int, T *> (or std::unordered_map<int, unique_ptr<T>> if C++0x is OK).
The hash table's elements can move around which is why you need to use a pointer, but it will support very fast insertion / lookup / deletion. Iteration is fast too, but the elements will come out in an indeterminate order.
Alternatively, I think you can implement your own idea as a very simple combination of a std::vector and a std::list. Just maintain both a list<T> my_list and a vector<list<T>::iterator> my_vector. To add an object, push it onto the back of my_list and then push its iterator onto my_vector. (Set an iterator to my_list.end() and decrement it to get the iterator for the last element.) To lookup, look up in the vector and just dereference the iterator. To delete, remove from the list (which you can do by iterator) and set the location in the vector to my_list.end().
std::list guarantees the elements within will not move when you delete them.
[update]
I am feeling motivated. First pass at an implementation:
#include <vector>
#include <list>
template <typename T>
class NairouList {
public:
typedef std::list<T> list_t;
typedef typename list_t::iterator iterator;
typedef std::vector<iterator> vector_t;
NairouList() : my_size(0)
{ }
void push_back(const T &elt) {
my_list.push_back(elt);
iterator i = my_list.end();
--i;
my_vector.push_back(i);
++my_size;
}
T &operator[](typename vector_t::size_type n) {
if (my_vector[n] == my_list.end())
throw "Dave's not here, man";
return *(my_vector[n]);
}
void remove(typename vector_t::size_type n) {
my_list.erase(my_vector[n]);
my_vector[n] = my_list.end();
--my_size;
}
size_t size() const {
return my_size;
}
iterator begin() {
return my_list.begin();
}
iterator end() {
return my_list.end();
}
private:
list_t my_list;
vector_t my_vector;
size_t my_size;
};
It is missing some Quality of Implementation touches... Like, you probably want more error checking (what if I delete the same element twice?) and maybe some const versions of operator[], begin(), end(). But it's a start.
That said, for "a few thousand" elements a map will likely serve at least as well. A good rule of thumb is "Never optimize anything until your profiler tells you to".
Looks like you might be wanting a std::deque. Removing an element is not as efficient as a std::list, but because deque's are typically created by using non-contiguous memory "blocks" that are managed via an additional pointer array/vector internal to the container (each "block" would be an array of N elements), removal of an element inside of a deque does not cause the same re-shuffling operation that you would see with a vector.
Edit: On second though, and after reviewing some of the comments, while I think a std::deque could work, I think a std::map or std::unordered_map will actually be better for you since it will allow the array-syntax indexing you want, yet give you fast removal of elements as well.

Changing object members via an iterator?

Been a while since I've used C++. Can I do something like this?:
for (vector<Node>::iterator n = active.begin(); n!=active.end(); ++n) {
n->ax /= n->m;
}
where Node is an object with a few floats in it?
If written in Java, what I'm trying to accomplish is something similar to:
for (Node n : this.active) {
n.ax /= n.m;
}
where active is an arrayList of Node objects.
I think I am forgetting some quirk about passing by reference or something throws hands in the air in desperation
Yes. This syntax basically works for almost all STL containers.
// this will walk it the container from the beginning to the end.
for(container::iterator it = object.begin(); it != object.end(); it++)
{
}
object.begin() - basically gives an iterator the first element of the container.
object.end() - the iterator is set to this value once it has gone through all elements. Note that to check the end we used !=.
operator ++ - Move the iterator to the next element.
Based on the type of container you may have other ways to navigate the iterator (say backwards, randomly to a spot in the container, etc). A good introduction to iterators is here.
Short answer: yes, you can.
The iterator is a proxy for the container element. In some cases the iterator is literally just a pointer to the element.
Your code works fine for me
#include <vector>
using std::vector;
struct Node{
double ax;
double m;
};
int main()
{
vector<Node> active;
for (vector<Node>::iterator n = active.begin(); n!=active.end(); ++n) {
n->ax /= n->m;
}
}
You can safely change an object contained in a container without invalidating iterators (with the associative containers, this applies only to the 'value' part of the element, not the 'key' part).
However, what you might be thinking of is that if you change the container (say by deleting or moving the element), then existing iterators might be invalidated, depending on the container, the operation being performed, and the details of the iterators involved (which is why you aren't allowed to change the 'key' of an object in an associative container - that would necessitate moving the object in the container in the general case).
In the case of std::vector, yes, you can manipulate the object simply by dereferencing the iterator. In the case of sorted containers, such as a std::set, you can't do that. Since a set inserts its elements in order, directly manipulating the contents isn't permitted since it might mess up the ordering.
What you have will work. You can also use one of the many STL algorithms to accomplish the same thing.
std::for_each(active.begin(), active.end(), [](Node &n){ n.ax /= n.m; });