Understanding Iterators in the STL - c++

What exactly are iterators in the C++ STL?
In my case, I'm using a list, and I don't understand why you have to make an iterator std::list <int>::const_iterator iElementLocator; to display contents of the list by the derefrence operator:
cout << *iElementLocator; after assigning it to maybe list.begin().
Please explain what exactly an iterator is and why I have to dereference or use it.

There are three building blocks in the STL:
Containers
Algorithms
Iterators
At the conceptual level containers hold data. That by itself isn't very useful, because you want to do something with the data; you want to operate on it, manipulate it, query it, play with it. Algorithms do exactly that. But algorithms don't hold data, they have no data -- they need a container for this task. Give a container to an algorithm and you have an action going on.
The only problem left to solve is how does an algorithm traverse a container, from a technical point of view. Technically a container can be a linked list, or it can be an array, or a binary tree, or any other data structure that can hold data. But traversing an array is done differently than traversing a binary tree. Even though conceptually all an algorithm wants is to "get" one element at a time from a container, and then work on that element, the operation of getting the next element from a container is technically very container-specific.
It appears as if you'd need to write the same algorithm for each container, so that each version of the algorithm has the correct code for traversing the container. But there's a better solution: ask the container to return an object that can traverse over the container. The object would have an interface algorithms know. When an algorithm asks the object to "get the next element" the object would comply. Because the object came directly from the container it knows how to access the container's data. And because the object has an interface the algorithm knows, we need not duplicate an algorithm for each container.
This is the iterator.
The iterator here glues the algorithm to the container, without coupling the two. An iterator is coupled to a container, and an algorithm is coupled to the iterator's interface. The source of the magic here is really template programming. Consider the standard copy() algorithm:
template<class In, class Out>
Out copy(In first, In last, Out res)
{
while( first != last ) {
*res = *first;
++first;
++res;
}
return res;
}
The copy() algorithm takes as parameters two iterators templated on the type In and one iterator of type Out. It copies the elements starting at position first and ending just before position last, into res. The algorithm knows that to get the next element it needs to say ++first or ++res. It knows that to read an element it needs to say x = *first and to write an element it needs to say *res = x. That's part of the interface algorithms assume and iterators commit to. If by mistake an iterator doesn't comply with the interface then the compiler would emit an error for calling a function over type In or Out, when the type doesn't define the function.

I'm being lazy. So I would not type describing what an iterator is and how they're used, especially when there're already lots of articles online that you can read yourself.
Here are few that I can quote for a start, proividing the links to complete articles:
MSDN says,
Iterators are a generalization of
pointers, abstracting from their
requirements in a way that allows a
C++ program to work with different
data structures in a uniform manner.
Iterators act as intermediaries
between containers and generic
algorithms. Instead of operating on
specific data types, algorithms are
defined to operate on a range
specified by a type of iterator. Any
data structure that satisfies the
requirements of the iterator may then
be operated on by the algorithm. There
are five types or categories of
iterator [...]
By the way, it seems the MSDN has taken the text in bold from C++ Standard itself, specifically from the section §24.1/1 which says
Iterators are a generalization of
pointers that allow a C + + program to
work with different data structures
(containers) in a uniform manner. To
be able to construct template
algorithms that work correctly and
efficiently on different types of data
structures, the library formalizes not
just the interfaces but also the
semantics and complexity assumptions
of iterators. All iterators i support
the expression *i, resulting in a
value of some class, enumeration, or
built-in type T, called the value type
of the iterator. All iterators i for
which the expression (*i).m is
well-defined, support the expression
i->m with the same semantics as
(*i).m. For every iterator type X for
which equality is defined, there is a
corresponding signed integral type
called the difference type of the
iterator.
cplusplus says,
In C++, an iterator is any object
that, pointing to some element in a
range of elements (such as an array or
a container), has the ability to
iterate through the elements of that
range using a set of operators (at
least, the increment (++) and
dereference (*) operators).
The most obvious form of iterator is a
pointer [...]
And you can also read these:
What Is an Iterator?
Iterators in the Standard C++ Library
Iterator (at wiki entry)
Have patience and read all these. Hopefully, you will have some idea what an iterator is, in C++. Learning C++ requires patience and time.

An iterator is not the same as the container itself. The iterator refers to a single item in the container, as well as providing ways to reach other items.
Consider designing your own container without iterators. It could have a size function to obtain the number of items it contains, and could overload the [] operator to allow you to get or set an item by its position.
But "random access" of that kind is not easy to implement efficiently on some kinds of container. If you obtain the millionth item: c[1000000] and the container internally uses a linked list, it will have to scan through a million items to find the one you want.
You might instead decide to allow the collection to remember a "current" item. It could have functions like start and more and next to allow you to loop through the contents:
c.start();
while (c.more())
{
item_t item = c.next();
// use the item somehow
}
But this puts the "iteration state" inside the container. This is a serious limitation. What if you wanted to compare each item in the container with every other item? That requires two nested loops, both iterating through all the items. If the container itself stores the position of the iteration, you have no way to nest two such iterations - the inner loop will destroy the working of the outer loop.
So iterators are an independent copy of an iteration state. You can begin an iteration:
container_t::iterator i = c.begin();
That iterator, i, is a separate object that represents a position within the container. You can fetch whatever is stored at that position:
item_t item = *i;
You can move to the next item:
i++;
With some iterators you can skip forward several items:
i += 1000;
Or obtain an item at some position relative to the position identified by the iterator:
item_t item = i[1000];
And with some iterators you can move backwards.
And you can discover if you've reached beyond the contents of the container by comparing the iterator to end:
while (i != c.end())
You can think of end as returning an iterator that represents a position that is one beyond the last position in the container.
An important point to be aware of with iterators (and in C++ generally) is that they can become invalid. This usually happens for example if you empty a container: any iterators pointing to positions in that container have now become invalid. In that state, most operations on them are undefined - anything could happen!

An iterator is to an STL container what a pointer is to an array. You can think of them as pointer objects to STL containers. As pointers, you will be able to use them with the pointer notation (e.g. *iElementLocator, iElementLocator++). As objects, they will have their own attributes and methods (http://www.cplusplus.com/reference/std/iterator).

There already exists a lot of good explanations of iterators. Just google it.
One example.
If there is something specific you don't understand come back and ask.

I'd suggest reading about operator overloading in C++. This will tell why * and -> can mean essentially anything. Only then you should read about the iterators. Otherwise it might appear very confusing.

Related

STL iterator revalidation for end (past-the-end) iterator?

See related questions on past-the-end iterator invalidation:
this, this.
This is more a question of design, namely, is there (in STL or elsewhere) such concept as past-the-end iterator "revalidation"?
What I mean by this, and use case: suppose an algorithm needs to "tail" a container (such as a queue). It traverses the container until end() is reached, then pauses; independently from this, another part of the program enqueues more items in the queue. How is it possible for the algorithm to (EDIT) efficiently tell, "have more items been enqueued" while holding the previously past-the-end iterator (call it tailIt)? (this would imply it is able to check if tailIt == container.end() still, and if that is false, conclude tailIt is now valid and points to the first element that was inserted).
Please don't dismiss the question as "no, there isn't" - I'm looking to form judgment around how to design some logic in an idiomatic way, and have many options (in fact the iterators in question are to a hand-built data structure for which I can provide this property - end() revalidation - but I would like to judge if it is a good idea).
EDIT: made it clear we have the iterator tailIt and a reference to container. A trivial workaround for what I'm trying to do is, also remember count := how many items you processed, and then check is container.size() == count still, and if not, seek to container[count] and continue processing from there. This comes with many disadvantages (extra state, assumption container doesn't pop from the front (!), random-access for efficient seeking).
Not in general. Here are some issues with your idea:
Some past-the-end iterators don't "point" to the data block at all; in fact this will be true of any iterator except a vector iterator. So, overall, an extant end-iterator just is never going to become a valid iterator to data;
Iterators often become invalidated when the container changes — while this isn't always true, it also precludes a general solution that relies on dereferencing some iterator from before the mutation;
Iterator validity is non-observable — you already need to know, before you dereference an iterator, whether or not it is valid. This is information that comes from elsewhere, usually your brain… by that I mean the developer must read the code and make a determination based on its structure and flow.
Put all these together and it is clear that the end iterator simply cannot be used this way as the iterator interface is currently designed. Iterators refer to data in a range, not to a container; it stands to reason, then, that they hold no information about a container, and if the container causes the range to change there's no entity that the iterator knows about that it can ask to find this out.
Is the described logic possible to create? Certainly! But with a different iterator interface (and support from the container). You could wrap the container in your own class type to do this. However, I advise against making things that look like standard iterators but behave differently; this will be very confusing.
Instead, encapsulate the container and provide your own wrapper function that can directly perform whatever post-enqueuement action you feel you need. You shouldn't need to watch the state of the end iterator to achieve your goal.
In the case for a std::queue, no there isn't (heh). Not because the iterators for a queue get invalidated once something is pushed, but because a queue doesn't have any iterators at all.
As for other iterator types, most (or any of them) of them don't require a reference to the container holder (the managing object containing all the info about the underlying data). Which is an trade-off for efficiency over flexibility. (I quickly checked the implementation of gcc's std::vector::iterator)It is possible to write an implementation for an iterator type that keeps a reference to the holder during its lifetime, that way the iterators never have to be invalidated! (unless the holder is std::move'd)
Now to throw in my professional opinion, I wouldn't mind seeing a safe_iterator/flex_iterator for cases where the iterator normally would be invalidated during iterations.
Possible user interface:
for (auto v : make_flex_iterator(my_vector)) {
if (some_outside_condition()) {
// Normally the vector would be invalidated at this point
// (only if resized, but you should always assume a resize)
my_vector.push_back("hello world!");
}
}
Literally revalidating iterators might be too complex to build for it's use case (I wouldn't know where to begin), but designing an iterator which simply never invalidates is quite trivial, with only as much overhead as a for (size_t i = 0; i < c.size(); i++); loop.But with that said, I cannot assure you how well the compiler will optimize, like unrolling loops, with these iterators. I do assume it will still do quite a good job.

Retrieving data from vectors and unordered_sets

I've started to put data into vectors and unordered_sets. I've worked out fairly easily how to put the data in, and I know how to get the data out if I need to unload all the data, for example:
for (auto i : vehicles)
MakeSpawnInfoVehicle(i.AddedInformation);
However, I've reached a point where I want just one element of information from either the unordered_sets or vectors such as what ever the 10th entry is in the vector or the unordered_set.
If someone could provide a basic example of both, I'm sure I'll understand it.
In situations like this, you can consult a good reference.
std::vector provides operator[] with constant-time performance:
auto tenth_element = vehicle_vector[9];
std::unordered_set is optimised for element-based lookup (that is, testing whether an element is present or not). It is rare you would need the 10th element (especially since the set is, by definition, unordered), but if you do, you take an iterator, increment it and dereference it:
auto tenth_element = *std::next(vehicle_set.begin(), 9);
In general, you access the elements of a container either through the container's member functions (such as the vector's operator [] above), or through an iterator.
An iterator is a generalisation of a pointer - it's some unspecified type which points to an element in the container. You can then work with iterators using their member functions like operator ++ or free functions like std::next(). Random-access iterators also support []. To get the element to which the iterator "points," dereference the iterator like *it or it->whatever.
You obtain an iterator to the beginning of the container using begin() and an iterator to one past the last element using end(). Containers can also provide other member functions to get an iterator to an element - such as vehicle_set.find(aVehicle), which returns an iterator to aVehicle if it's present in the set, or the end() iterator if it's not.
Which member functions a container offers depends on which container it is and in particular how efficient the operations are. std::vector doesn't proivde find(), because it would be no better than std::find() - there is no structure in std::vector for fast lookup. std::unordered_set does provide find(), but doesn't have operator [], because its iterators are not random-access: accessing the nth element of an unordered set requires time proportional to n.

I want a simplest visualization for STL containers

I have read many things about STL containers (Sequence Containers, Associative Container, Container adapters) but I'm still don't understand the internal form for each of them.
I want visualization form supported with pictures to every container in STL, (ex: what is the form of data inside the container) if is possible.
I hope that my question be clear.
The internal working of the containers is implementation independent, if the performance (Big-O) is correct and the preconditions and post-conditions of the container and his methods (ex: default constructed = empty container) are meet, the implementation is standard.
Example in VS:
1-vector of T - buffer (T[], T*). Normal internal representation is 3 pointer, begin and end of the buffer and current element.
2-deque of T - implemented in similar way to vector, but inserted element not need to be one after the other, because the container support inserting before and after with O(1) normally if you create deque it begin empty as all container, suppose a value is inserted from back, in this case is inserted in position 0, if a value is inserted from front is inserted in position capacity of deque (until the two index begin and end encounter, in this case grow is needed, and copy, like vector, and order of the value updated). Normal internal representation is pointer to begin and end of the buffer, first and last element of the deque.
3-list of T - double linked list (node with next, previous and T)
4-asociative of T (set, multiset, map, multimap) - Red Black Tree with value T or pair in case of maps like. Normal internal representation is left and right Node and value (used for sort)
5-unordered asociative of T (unordered_set, multiset ...): hash map implementation new in C++11.
6-Adaptor - the underlying representation is one of the previous containers (ex: stack is an adapter that is implemented by default with deque), in instantiation time one can change the underlying representation to tweak performance Big-O according to constrain (it may require Random Access, Forward, etc...).
This specifics can change in any version of the libraries, when any optimization is founded, and in any other case. For example some implementation of vector have pointer to begin, capacity (instead of end), and current element.
http://www.cs.usfca.edu/~galles/visualization/Algorithms.html
This will help a little bit, although the underlying data structure can be vary upon implementation

Difference between std::remove and erase for vector?

I have a doubt that I would like to clarify in my head. I am aware of the different behavior for std::vector between erase and std::remove where the first physically removes an element from the vector, reducing size, and the other just moves an element leaving the capacity the same.
Is this just for efficiency reasons? By using erase, all elements in a std::vector will be shifted by 1, causing a large amount of copies; std::remove does just a 'logical' delete and leaves the vector unchanged by moving things around. If the objects are heavy, that difference might matter, right?
Is this just for efficiency reason? By using erase all elements in a std::vector will be shifted by 1 causing a large amount of copies; std::remove does just a 'logical' delete and leaves the vector unchanged by moving things around. If the objects are heavy that difference mihgt matter, right?
The reason for using this idiom is exactly that. There is a benefit in performance, but not in the case of a single erasure. Where it does matter is if you need to remove multiple elements from the vector. In this case, the std::remove will copy each not removed element only once to its final location, while the vector::erase approach would move all of the elements from the position to the end multiple times. Consider:
std::vector<int> v{ 1, 2, 3, 4, 5 };
// remove all elements < 5
If you went over the vector removing elements one by one, you would remove the 1, causing copies of the remainder elements that get shifted (4). Then you would remove 2 and shift all remainding elements by one (3)... if you see the pattern this is a O(N^2) algorithm.
In the case of std::remove the algorithm maintains a read and write heads, and iterates over the container. For the first 4 elements the read head will be moved and the element tested, but no element is copied. Only for the fifth element the object would be copied from the last to the first position, and the algorithm will complete with a single copy and returning an iterator to the second position. This is a O(N) algorithm. The later std::vector::erase with the range will cause destruction of all the remainder elements and resizing the container.
As others have mentioned, in the standard library algorithms are applied to iterators, and lack knowledge of the sequence being iterated. This design is more flexible than other approaches on which algorithms are aware of the containers in that a single implementation of the algorithm can be used with any sequence that complies with the iterator requirements. Consider for example, std::remove_copy_if, it can be used even without containers, by using iterators that generate/accept sequences:
std::remove_copy_if(std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
std::ostream_iterator<int>(std::cout, " "),
[](int x) { return !(x%2); } // is even
);
That single line of code will filter out all even numbers from standard input and dump that to standard output, without requiring the loading of all numbers into memory in a container. This is the advantage of the split, the disadvantage is that the algorithms cannot modify the container itself, only the values referred to by the iterators.
std::remove is an algorithm from the STL which is quite container agnostic. It requires some concept, true, but it has been designed to also work with C arrays, which are static in sizes.
std::remove simply returns a new end() iterator to point to one past the last non-removed element (the number of items from the returned value to end() will match the number of items to be removed, but there is no guarantee their values are the same as those you were removing - they are in a valid but unspecified state). This is done so that it can work for multiple container types (basically any container type that a ForwardIterator can iterate through).
std::vector::erase actually sets the new end() iterator after adjusting the size. This is because the vector's method actually knows how to handle adjusting it's iterators (the same can be done with std::list::erase, std::deque::erase, etc.).
remove organizes a given container to remove unwanted objects. The container's erase function actually handles the "removing" the way that container needs it to be done. That is why they are separate.
I think it has to do with needing direct access to the vector itself to be able to resize it. std::remove only has access to the iterators, so it has no way of telling the vector "Hey, you now have fewer elements".
See yves Baumes answer as to why std::remove is designed this way.
Yes, that's the gist of it. Note that erase is also supported by the other standard containers where its performance characteristics are different (e.g. list::erase is O(1)), while std::remove is container-agnostic and works with any type of forward iterator (so it works for e.g. bare arrays as well).
Kind of. Algorithms such as remove work on iterators (which are an abstraction to represent an element in a collection) which do not necessarily know which type of collection they are operating on - and therefore cannot call members on the collection to do the actual removal.
This is good because it allows algorithms to work generically on any container and also on ranges that are subsets of the entire collection.
Also, as you say, for performance - it may not be necessary to actually remove (and destroy) the elements if all you need is access to the logical end position to pass on to another algorithm.
Standard library algorithms operate on sequences. A sequence is defined by a pair of iterators; the first points at the first element in the sequence, and the second points one-past-the-end of the sequence. That's all; algorithms don't care where the sequence comes from.
Standard library containers hold data values, and provide a pair of iterators that specify a sequence for use by algorithms. They also provide member functions that may be able to do the same operations as an algorithm more efficiently by taking advantage of the internal data structure of the container.

Use of iterators over array indices

I just wanted to know what is the main advantage of using the iterators over the array indices. I have googled but i am not getting the right answer.
I presume you are talking about when using a vector, right?
The main advantage is that iterator code works for all stl containers, while the array indexing operator [] is only available for vectors and deques. This means you are free to change the underlying container if you need to without having to recode every loop. It also means you can put your iteration code in a template and it will work for any container, not just for deques and vectors (and arrays of course).
All of the standard containers provide the iterator concept. An iterator knows how to find the next element in the container, especially when the underlying structure isn't array-like. Array-style operator[] isn't provided by every container, so getting in the habit of using iterators will make more consistent-looking code, regardless of the container you choose.
You can abstract the collection implementation away.
To expand upon previous answers:
Writing a loop with operator[] constrains you to a container that supports [] and uses the same index/size type. Otherwise you'd need to rewrite every loop to change the container.
Even if your container supports [], it may not be optimal for sequential traversing. [] is basically a random-access operator, which is O(1) for vector but could be as bad as O(n) depending on the underlying container.
This is a minor point, but if you use iterators, your loop could be more easily moved to using the standard algorithms, e.g. std::for_each.
There are many data structures, e.g. hash tables and linked lists cannot be naturally or quickly indexed, but they are indeed traversable. Iterators act as an interface that let you walk on any data structure without knowing the actual implementation of the source.
The STL contains algorithms, such as transform and for_each that operate on containers. They don't accept indices, but use iterators.
Iterators help hide the container implementation and allow the programmer to focus more on the algorithm. The for_each function can be applied to anything that supports a forward iterator.
As well as the points in other answers, iterators can also be faster (specifically compared to operator[]), since they are essentially iteration by pointer. If you do something like:
for (int i = 0; i < 10; ++i)
{
my_vector[i].DoSomething();
}
Every iteration of the loop unnecessarily calculates my_vector.begin() + i. If you use iterators, incrementing the iterator means it's already pointing to the next element, so you don't need that extra calculation. It's a small thing, but can make a difference in tight loops.
One other slight difference is that you can't use erase() on an element in a vector by index, you must have an iterator. No big deal since you can always do "vect.begin() + index" as your iterator, but there are other considerations. For example, if you do this then you must always check your index against size() and not some variable you assigned that value.
None of that is really too much worth worrying about but if given the choice I prefer iterator access for the reasons already stated and this one.
I would say it's more a matter of consistency and code reuse.
Consistency in that you will use all other containers with iterators
Code reuse in that algorithms written for iterators cannot be used with the subscript operator and vice versa... and the STL has lots of algorithms so you definitely want to build on it.
Finally, I'd like to say that even C arrays have iterators.
const Foo* someArray = //...
const Foo* otherArray = new Foo[someArrayLength];
std::copy(someArray, someArray + someArrayLength, otherArray);
The iterator_traits class has been specialized so that pointers or a model of RandomAccessIterator.