Use of iterators over array indices - c++

I just wanted to know what is the main advantage of using the iterators over the array indices. I have googled but i am not getting the right answer.

I presume you are talking about when using a vector, right?
The main advantage is that iterator code works for all stl containers, while the array indexing operator [] is only available for vectors and deques. This means you are free to change the underlying container if you need to without having to recode every loop. It also means you can put your iteration code in a template and it will work for any container, not just for deques and vectors (and arrays of course).

All of the standard containers provide the iterator concept. An iterator knows how to find the next element in the container, especially when the underlying structure isn't array-like. Array-style operator[] isn't provided by every container, so getting in the habit of using iterators will make more consistent-looking code, regardless of the container you choose.

You can abstract the collection implementation away.

To expand upon previous answers:
Writing a loop with operator[] constrains you to a container that supports [] and uses the same index/size type. Otherwise you'd need to rewrite every loop to change the container.
Even if your container supports [], it may not be optimal for sequential traversing. [] is basically a random-access operator, which is O(1) for vector but could be as bad as O(n) depending on the underlying container.
This is a minor point, but if you use iterators, your loop could be more easily moved to using the standard algorithms, e.g. std::for_each.

There are many data structures, e.g. hash tables and linked lists cannot be naturally or quickly indexed, but they are indeed traversable. Iterators act as an interface that let you walk on any data structure without knowing the actual implementation of the source.

The STL contains algorithms, such as transform and for_each that operate on containers. They don't accept indices, but use iterators.
Iterators help hide the container implementation and allow the programmer to focus more on the algorithm. The for_each function can be applied to anything that supports a forward iterator.

As well as the points in other answers, iterators can also be faster (specifically compared to operator[]), since they are essentially iteration by pointer. If you do something like:
for (int i = 0; i < 10; ++i)
{
my_vector[i].DoSomething();
}
Every iteration of the loop unnecessarily calculates my_vector.begin() + i. If you use iterators, incrementing the iterator means it's already pointing to the next element, so you don't need that extra calculation. It's a small thing, but can make a difference in tight loops.

One other slight difference is that you can't use erase() on an element in a vector by index, you must have an iterator. No big deal since you can always do "vect.begin() + index" as your iterator, but there are other considerations. For example, if you do this then you must always check your index against size() and not some variable you assigned that value.
None of that is really too much worth worrying about but if given the choice I prefer iterator access for the reasons already stated and this one.

I would say it's more a matter of consistency and code reuse.
Consistency in that you will use all other containers with iterators
Code reuse in that algorithms written for iterators cannot be used with the subscript operator and vice versa... and the STL has lots of algorithms so you definitely want to build on it.
Finally, I'd like to say that even C arrays have iterators.
const Foo* someArray = //...
const Foo* otherArray = new Foo[someArrayLength];
std::copy(someArray, someArray + someArrayLength, otherArray);
The iterator_traits class has been specialized so that pointers or a model of RandomAccessIterator.

Related

When the data structure is a template parameter, how can I tell if an operation will invalidate an iterator?

Specifically, I have a class which currently uses a vector and push_back. There is an element inside the vector I want to keep track of. Pushing back on the vector may invalidate the iterator, so I keep its index around. It's cheap to find the iterator again using the index. I can't reserve the vector as I don't know how many items will be inserted.
I've considered making the data structure a template parameter, and perhaps list may be used instead. In that case, finding an iterator from the index is not a trivial operation. Since pushing back on a list doesn't invalidate iterators to existing elements, I could just store this iterator instead.
But how can I code a generic class which handles both cases easily?
If I can find out whether push_back will invalidate the iterator, I could store the iterator and update it after each push_back by storing the distance from the beginning before the operation.
You should probably try to avoid this flexibility. Quote from Item 2 "Beware the illusion of container-independent code" from Effective STL by Scott Meyers:
Face the truth: it's not worth it. The different containers are
different, and they have strengths and weaknesses that vary in significant ways. They're not designed to be interchangeable, and
there's littel you can do to paper that over. If you try, you're
merely tempting fate, and fate doesn't like to be tempted.
If you really, positively, definitely have to maintain valid iterators, use std::list. If you also need to have random access, try Boost.MultiIndex (although you'll lose contiguous memory access).
If you look at the standard container adapators (std::stack, std::queue) you see that they support the intersection of the adaptable containers interfaces, not their union.
I'd create a second class, which responsibility would be to return the iterator you are interested in.
It should also be parametrized with the same template parameter, and then you can specialize it for any type you want (vector/list etc). So inside your specializations you can use any method you want.
So it's some traits-based solution.
If you really want to stick with the vector and have that functionality maybe take a look at
http://en.cppreference.com/w/cpp/container/vector/capacity function. wrap your push_backs in defined function or even better wrap whole std::vector in ur class and before push_backing compare capacity against size() to check if resize will happen.

why does it need it!=obj.end() although every iterator should know itself when to terminate?

the following question just shot through my head. For the c++ stl iterators, a common practice is to have something like:
for (iterator it=obj.begin(); it!=obj.end(); it++)
what I am wondering is actually that obj.begin() could have told "it" when to stop which would make the for loop look like:
for (iterator it=obj.begin(); !it.end(); it++)
The benefit would be to make the iterator more self contained, and one could save (iterator end()) in the container class.
Sometimes you want to do something other than iterate over the entire contents of a container. For example, you can create a pair of iterators that will iterate over only the first half of a container. So having a separate object to represent the end is more flexible since it allows the user more control of where to place the end of a range.
You are right that it is somewhat inconvenient for the most common case of iterating over everything. However, C++11 provides a range based for loop which makes looping over a whole container really easy, so like many things in programming, it's really just a matter of selecting the right construct to best express your intention.
If the iterator API would be designed like that, a pointer wouldn't be a valid iterator (since a pointer obviously wouldn't have an end()) method. So that would rule out the most straight-forward way to implement iterators for data structures with contiguous memory.
Some libraries provide "java-style iterators" that work like you describe.
However, the big problem with this scheme (hasNext(), etc) is that such iterators are classes.
For example, STL contains <algorithm> header that contains functions like std::copy, std::generate, std::sort, std::lower_bound, std::fill etc. All those routines use begin/end style iterators. As a result, you can use pointers with those functions. If those function were operating on iterators that are classes (i.e. if they called hasNext() instead of != end() internally), then you wouldn't be able to pass pointers into std::sort and such. In this scenario you would have to wrap everything into class, wasting your time, and losing access to <algorithm> is not worth minor convenience you'd get by adding atEnd() method to iterator class. That's probably the reason why iterators are compared to the end().
There is no requirement for iterator to "know" about the container. It could just know and care about an offset in a block of memory or a current node (or whatever is appropriate for the data structure being iterated on), without knowing anything about encompassing container. For example, a vector iterator could be implemented as a simple pointer, without knowing (by itself) where the vector ends.
Also, STL algorithms need to work on raw pointers as well, so iterators need to "mimic" pointers.
Ultimately it's just the decision made when the STL was invented (by Stepanov in the early 90s), and later ratified in the C++ standardization process, that an iterator would be a generalization of a pointer. From http://www.sgi.com/tech/stl/stl_introduction.html:
In the example of reversing a C array, the arguments to reverse are
clearly of type double*. What are the arguments to reverse if you are
reversing a vector, though, or a list? ... The answer is that the
arguments to reverse are iterators, which are a generalization of
pointers.
Iterators didn't have to be a generalization of pointers. The C++ standard libraries (and before that the STL) could in theory have used a different iteration model, in which an iteration would be represented either by a single iterator object or a single range object, instead of by a pair of iterators first and last.
I don't think there would be much performance difference. There certainly wouldn't be with modern C++ compilers. The STL (and the standard libraries based on it) always relied for performance on decent inlining by the compiler, and these classes wouldn't be any worse than what compilers already have to deal with in the bowels of the containers. Nor would there be any major difficulty in providing a simple wrapper that turns a pair of pointers into an iterator or range object.
Some people do prefer other iterator models - James Gosling for one (or if not him, whoever designed Java iterators). And some people prefer ranges (including plenty of C++ programmers: hence Boost.Range).
I suspect the authors of the STL and C++ liked the fact that STL-style iterators retained a kind of compatibility with C, that you could take a C algorithm that used pointers to operate on an array (or other range specified by the user with a pair of pointers), and transform it almost unchanged into a C++ algorithm that uses iterators to operate on a container (or other range). That's the kind of thinking that means your new language can be more easily assimilated by an existing user-base, which was one of the goals of C++ early on.
So questions like, "can we provide a means of testing for end which is a few characters shorter, at the cost of making pointers no longer be iterators and making iterators larger" probably would not have got much interest at the time. But that was then, and for example Andrei Alexandrescu has been banging on for a while now that in his opinion this is not longer the best choice, and that ranges are better than iterators.
Having the iterator work the way it does gives you the flexibility to work on subranges. You don't always have to start at begin() or end at end(). For example if you want to work with all the iterators containing a certain value you can use the return value of equal_range() for your beginning and ending.
The D language has a range which incorporates both the beginning and end of an iterator range. It was also proposed for C++: http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/7e52451ed8eea31c
Knowing where your iterator's begin and end (and rbegin and rend) are is certainly useful but the case you state is somewhat handled by C++11's range-based for-loop:
for (auto it: obj)
{
// Do something with *it;
}
It's much easier to get things done in C++11 without writing so much boiler-plate!

why Vector doesn't provide the remove() member function while list provides?

If I want to delete all the elements with a value from vector,I call remove algorithm and then call vector's erase member function to physically delete it.
But in the case of list , simple call remove member function and it will delete all elements with that value.
I am not sure why vector does't provide the remove MF while list does it.
For Exp: I want to delete value '4' from vector v.
vector<int> v;
vector<int> ::iterator Itr;
for (int i=0; i< 6; i++)
v.push_back(i*2);
v.push_back(4);
v.push_back(8);
v.push_back(4);
v.erase(remove(v.begin(),v.end(),4), v.end());
and for list:
list.remove(4); // will delete all the element which has value 4
The question is not why std::vector does not offer the operation, but rather why does std::list offer it. The design of the STL is focused on the separation of the containers and the algorithms by means of iterators, and in all cases where an algorithm can be implemented efficiently in terms of iterators, that is the option.
There are, however, cases where there are specific operations that can be implemented much more efficiently with knowledge of the container. That is the case of removing elements from a container. The cost of using the remove-erase idiom is linear in the size of the container (which cannot be reduced much), but that hides the fact that in the worst case all but one of those operations are copies of the objects (the only element that matches is the first), and those copies can represent quite a big hidden cost.
By implementing the operation as a method in std::list the complexity of the operation will still be linear, but the associated cost for each one of the elements removed is very low, a couple of pointer copies and releasing of a node in memory. At the same time, the implementation as part of the list can offer stronger guarantees: pointers, references and iterators to elements that are not erased do not become invalidated in the operation.
Another example of an algorithm that is implemented in the specific container is std::list::sort, that uses mergesort that is less efficient than std::sort but does not require random-access iterators.
So basically, algorithms are implemented as free functions with iterators unless there is a strong reason to provide a particular implementation in a concrete container.
I believe the rationale is that std::list offers a very efficient remove method (if implemented as a doubly linked listed it just adjusts the pointers to the element and deallocates its storage), while element removal for std::vector is comparably slow.
The remove+erase trick is the standard way which works for all container types that offer the required iterator type. Presumably, the designers of the STL wanted to point out this difference. They could have opted to give all containers a remove method, which would be remove+erase for all containers except those who knew a better way, but they didn't.
This seems to violate the idea of generic code at a first glance, but I don't think it really does. Having a simple, generic remove that is easily accessible makes it easy to write generic code that compiles fine with all container types, but at the end generic code that would run extremely slow in 9 out of 10 cases and blazingly fast in the tenth is not truly generic, it just looks so.
The same pattern can be found at std::map::find vs. the generic std::find.
Removing an element from a vector is much slower than doing so for a list: it is (on average) proportional to the size of the vector, whereas the operation on a list is executed in constant time.
The designers of the standard library decided not to include this feature under the principle of "things that look easy should BE (computationally) easy".
I'm not sure whether I agree with this philosophy, but it's considered a design goal for C++.
Because dropping item from a list is cheap, while doing so on a vector is expensive - all following elements have to be shifted, i.e. copied/moved.
I imagine its due to efficiency, its slower to remove random elements from a vector than it is from a list. It might mean a little more typing for situations like this, but at least its obvious in the interface that the std::vector isn't the best data structure if you need to do this often.

Understanding Iterators in the STL

What exactly are iterators in the C++ STL?
In my case, I'm using a list, and I don't understand why you have to make an iterator std::list <int>::const_iterator iElementLocator; to display contents of the list by the derefrence operator:
cout << *iElementLocator; after assigning it to maybe list.begin().
Please explain what exactly an iterator is and why I have to dereference or use it.
There are three building blocks in the STL:
Containers
Algorithms
Iterators
At the conceptual level containers hold data. That by itself isn't very useful, because you want to do something with the data; you want to operate on it, manipulate it, query it, play with it. Algorithms do exactly that. But algorithms don't hold data, they have no data -- they need a container for this task. Give a container to an algorithm and you have an action going on.
The only problem left to solve is how does an algorithm traverse a container, from a technical point of view. Technically a container can be a linked list, or it can be an array, or a binary tree, or any other data structure that can hold data. But traversing an array is done differently than traversing a binary tree. Even though conceptually all an algorithm wants is to "get" one element at a time from a container, and then work on that element, the operation of getting the next element from a container is technically very container-specific.
It appears as if you'd need to write the same algorithm for each container, so that each version of the algorithm has the correct code for traversing the container. But there's a better solution: ask the container to return an object that can traverse over the container. The object would have an interface algorithms know. When an algorithm asks the object to "get the next element" the object would comply. Because the object came directly from the container it knows how to access the container's data. And because the object has an interface the algorithm knows, we need not duplicate an algorithm for each container.
This is the iterator.
The iterator here glues the algorithm to the container, without coupling the two. An iterator is coupled to a container, and an algorithm is coupled to the iterator's interface. The source of the magic here is really template programming. Consider the standard copy() algorithm:
template<class In, class Out>
Out copy(In first, In last, Out res)
{
while( first != last ) {
*res = *first;
++first;
++res;
}
return res;
}
The copy() algorithm takes as parameters two iterators templated on the type In and one iterator of type Out. It copies the elements starting at position first and ending just before position last, into res. The algorithm knows that to get the next element it needs to say ++first or ++res. It knows that to read an element it needs to say x = *first and to write an element it needs to say *res = x. That's part of the interface algorithms assume and iterators commit to. If by mistake an iterator doesn't comply with the interface then the compiler would emit an error for calling a function over type In or Out, when the type doesn't define the function.
I'm being lazy. So I would not type describing what an iterator is and how they're used, especially when there're already lots of articles online that you can read yourself.
Here are few that I can quote for a start, proividing the links to complete articles:
MSDN says,
Iterators are a generalization of
pointers, abstracting from their
requirements in a way that allows a
C++ program to work with different
data structures in a uniform manner.
Iterators act as intermediaries
between containers and generic
algorithms. Instead of operating on
specific data types, algorithms are
defined to operate on a range
specified by a type of iterator. Any
data structure that satisfies the
requirements of the iterator may then
be operated on by the algorithm. There
are five types or categories of
iterator [...]
By the way, it seems the MSDN has taken the text in bold from C++ Standard itself, specifically from the section §24.1/1 which says
Iterators are a generalization of
pointers that allow a C + + program to
work with different data structures
(containers) in a uniform manner. To
be able to construct template
algorithms that work correctly and
efficiently on different types of data
structures, the library formalizes not
just the interfaces but also the
semantics and complexity assumptions
of iterators. All iterators i support
the expression *i, resulting in a
value of some class, enumeration, or
built-in type T, called the value type
of the iterator. All iterators i for
which the expression (*i).m is
well-defined, support the expression
i->m with the same semantics as
(*i).m. For every iterator type X for
which equality is defined, there is a
corresponding signed integral type
called the difference type of the
iterator.
cplusplus says,
In C++, an iterator is any object
that, pointing to some element in a
range of elements (such as an array or
a container), has the ability to
iterate through the elements of that
range using a set of operators (at
least, the increment (++) and
dereference (*) operators).
The most obvious form of iterator is a
pointer [...]
And you can also read these:
What Is an Iterator?
Iterators in the Standard C++ Library
Iterator (at wiki entry)
Have patience and read all these. Hopefully, you will have some idea what an iterator is, in C++. Learning C++ requires patience and time.
An iterator is not the same as the container itself. The iterator refers to a single item in the container, as well as providing ways to reach other items.
Consider designing your own container without iterators. It could have a size function to obtain the number of items it contains, and could overload the [] operator to allow you to get or set an item by its position.
But "random access" of that kind is not easy to implement efficiently on some kinds of container. If you obtain the millionth item: c[1000000] and the container internally uses a linked list, it will have to scan through a million items to find the one you want.
You might instead decide to allow the collection to remember a "current" item. It could have functions like start and more and next to allow you to loop through the contents:
c.start();
while (c.more())
{
item_t item = c.next();
// use the item somehow
}
But this puts the "iteration state" inside the container. This is a serious limitation. What if you wanted to compare each item in the container with every other item? That requires two nested loops, both iterating through all the items. If the container itself stores the position of the iteration, you have no way to nest two such iterations - the inner loop will destroy the working of the outer loop.
So iterators are an independent copy of an iteration state. You can begin an iteration:
container_t::iterator i = c.begin();
That iterator, i, is a separate object that represents a position within the container. You can fetch whatever is stored at that position:
item_t item = *i;
You can move to the next item:
i++;
With some iterators you can skip forward several items:
i += 1000;
Or obtain an item at some position relative to the position identified by the iterator:
item_t item = i[1000];
And with some iterators you can move backwards.
And you can discover if you've reached beyond the contents of the container by comparing the iterator to end:
while (i != c.end())
You can think of end as returning an iterator that represents a position that is one beyond the last position in the container.
An important point to be aware of with iterators (and in C++ generally) is that they can become invalid. This usually happens for example if you empty a container: any iterators pointing to positions in that container have now become invalid. In that state, most operations on them are undefined - anything could happen!
An iterator is to an STL container what a pointer is to an array. You can think of them as pointer objects to STL containers. As pointers, you will be able to use them with the pointer notation (e.g. *iElementLocator, iElementLocator++). As objects, they will have their own attributes and methods (http://www.cplusplus.com/reference/std/iterator).
There already exists a lot of good explanations of iterators. Just google it.
One example.
If there is something specific you don't understand come back and ask.
I'd suggest reading about operator overloading in C++. This will tell why * and -> can mean essentially anything. Only then you should read about the iterators. Otherwise it might appear very confusing.

Advance iterator for the std::vector std::advance VS operator +?

I found myself writing the following a lot:
int location =2;
vector<int> vec;
vector<int>::iterator it=vec.begin();
/..../
std::advance(it, location);
instead of
it= it + 5;
what is the Preferred/Recommended way ?
Adding will only work with random access iterators. std::advance will work with all sorts of iterators. As long as you're only dealing with iterators into vectors, it makes no real difference, but std::advance keeps your code more generic (e.g. you could substitute a list for the vector, and that part would still work).
For those who care, the standard describes advance and distance as follows (§24.3.4/1):
Since only random access iterators provide + and - operators, the library provides two function templates advance and distance. These function templates use + and - for random access iterators (and are, therefore, constant time for them); for input, forward and bidirectional iterators they use ++ to provide linear time implementations.
Also note that starting with C++11, the standard added a parameter to std::next, so you can advance by a specified amount using it (and std::prev similarly). The difference from std::advance is that it returns the modified iterator (which std::advance doesn't), which can be convenient in some cases.
That depends on what you need:
If you need genericity, use std::advance(it,2). If someone comes along and changes your std::vector into a std::list, the code will still compile, even though advancing now takes linear time instead of constant time.
If you need performance, use it+=2. If someone comes along and changes your std::vector into a std::list, the code will fail to compile, pointing (maybe with a helpful comment) at a serious performance issue.
It depends on the iterator. it=it+5 is faster if it's supported (it's only supported on random access iterators). If you want to advance a less-capable iterator (e.g. a forward iterator, or a bidirectional iterator), then you can use std::advance, but it's slower because it actually walks across all of the intermediate elements.
std::advance works on non-random iterators too while the += version on works on random access sequences (vectors and the like).
std::adnvance is generic - it is useful if you don't always know type of underlying container - it works in all cases.
Yet it is efficient: std::advance will do an optimisation if it passed an RandomAccessIterator (like one from std::vector) and will increase iterator in loop for ForwardAccessIterator (as like one in std::list).
Use std::advance. It is just as efficient (it uses iterator traits to just do iterator addition for random access iterators), and is more general in that it works on other kinds of iterators as well.
If you're never going to change the container (and you probably aren't), use + because it's easy to see and understand and leaves the code less cluttered.
If you think you want to change the container, OR if you are working inside a template that might be instantiated on various container types, use advance because it works with anything.
As a general rule, I don't worry about changing container types because I've found that when I do have to change a container type, I end up revisiting everywhere that container is used anyway, just to be sure I'm not doing anything that's suddenly stupid (like randomly plucking elements out of the middle of a list).
I use += and + everywhere because it doesn't clutter the code and any so special iterator could also provide += and + operator overloads so the argument about generic programming doesn't really make sense from my point of view.