What is std::contiguous_iterator useful for? - c++

For what purposes I can use it?
Why is it better than random_access_iterator?
Is there some advantage if I use it?

For a contiguous iterator you can get a pointer to the element the iterator is "pointing" to, and use it like a pointer to a contiguous array.
That can't be guaranteed with a random access iterator.
Remember that e.g. std::deque is a random-access container, but it's typically not a contiguous container (as opposed to std::vector which is both random access and contiguous).

In C++17, there is no such thing as a std::contiguous_iterator. There is the ContiguousIterator named requirement however. This represents a random access iterator over a sequence of elements where each element is stored contiguously, in exactly the same way as an array. Which means that it is possible, given a pointer to an value_type from an iterator, to perform pointer arithmetic on that pointer, which shall work in exactly the same way as performing the same arithmetic on the corresponding iterators.
The purpose of this is to allow for more efficient implementations of algorithms on iterators that are contiguous. Or to forbid algorithms from being used on iterators that aren't contiguous. One example of where this matters is if you're trying to pass C++ iterators into a C interface which is based on pointers to arrays. You can wrap such interfaces behind generic algorithms, verifying the contiguity of the iterator in the template.
Or at least, you could in theory; in C++17, that wasn't really possible.. The reason being that there was not actually a way to test if an iterator was a ContiguousIterator. There's no way to ask a pointer if doing pointer arithmetic on a pointer to an element from the iterator is legal. And there was no std::contiguous_iterator_category one could use for such iterators (as this could cause compatibility problems). So you couldn't use SFINAE tools to verify that an iterator was contiguous.
C++20's std::contiguous_iterator concept resolves this problem. It also resolves the other problem with contiguous iterators. See, the above explanation for ContiguousIterator's behavior starts with us having a pointer to an element from the range. Well, how did you get that? The obvious method would be to do something like std::addressof(*it), but what if it is the end iterator? The end iterator is not dereference-able, so you can't do that. Basically, even if you know that an iterator is contiguous, how do you go about converting it to the equivalent pointer?
The std::contiguous_iterator concept solves both of these problems. std::to_address is available, which will convert any contiguous iterator into its equivalent pointer value. And there is a traits tag that an iterator must provide to denote that it is in fact a contiguous iterator, just in case the default to_address implementation happens to be valid for a non-contiguous iterator.

A random access iterator only requires a constant time (iterator) + (offset), whereas contiguous iterators have the stronger guarantee that std::addressof(*((iterator) + (offset))) == std::addressof(*(iterator)) + (offset) (disregarding overloaded operator&s).
This basically means that the iterator is a pointer or a light wrapper around a pointer, so it is equivalent to a pointer to its elements, whereas a random access iterator can do more, at the cost of possibly being bulkier and being unable to turn it into a simple pointer.

As a C++20 Concept, I would expect you can use it to specify a different algorithm if the container is contiguous. Perhaps exploiting cache locality.

Related

How valid positions in vector::insert()? [duplicate]

This question is related with item 16 of effective stl book which states that while using vector(lets assume vector<int>vec) instead of array in a legacy code we must use &vec[0] instead of vec.begin() :
void doSomething(const int* pInts, size_t numlnts);
dosomething(&vec[0],vec.size()); \\correct!!
dosomething(vec.begin(),vec.size()); \\ wrong!! why???
The book states that vec.begin() is not same as &vec[0] . Why ? What the difference between the two ?
A std::vector is sequence container that encapsulates dynamic size arrays. This lets you conveniently store a bunch of elements without needing to be as concerned with managing the underlying array that is the storage for your elements. A large part of the convenience of using these classes comes from the fact that they give you a bunch of methods that let you deal with the sequence without needing to deal with raw pointers, an iterator is an example of this.
&vec[0] is a pointer to the first element of the underlying storage that the vector is using. vec.begin() is an iterator that starts at the beginning of the vector. While both of these give you a way to access the elements in the sequence these are 2 distinct concepts. Search up iterators to get a better idea of how this works.
If your code supports iterators its often easiest to use the iterators to iterate over the data. Part of the reasons for this is that iterators are not pointers, they let you iterate over the elements of the data structure without needing to know as much about the implementation details of the datastructure you are iterating over.
However sometimes you need the raw array of items, for example in some legacy API's or calls to C code you might need to pass a pointer to the array. In this case you have no choice but to extract the raw array from the vector, you can do this using something such as &vec[0]. Note that if you have c++11 support there's an explicit way to do this with std::vector::data which will give you access to the underlying storage array. The c++11 way has the additional benefit of also more clearly stating your intent to the people reading your code.
Formally, one produces an iterator, and the other a pointer, but I think the major difference is that vec[0] will do bad stuff if the vector is empty, while vec.begin() will not.
vec.begin() has type std::vector<int>::iterator. &vec[0] has type pointer to std::vector<int>::value_type. These are not necessarily the same type.
It is possible that a given implementation uses pointers as the iterator implementation for a vector, but this is not guaranteed, and thus you should not rely on that assumption. In fact most implementations do provide a wrapping iterator type.
Regarding your question about pointers being iterators, this is partly true. Pointers do meet the criteria of a random access iterator, but not all iterators are pointers. And there are iterators that do not support the random access behavior of pointers.

Are all pointers considered iterators?

This was a question on my exam and the answer is that all pointers are iterators but not all iterators are pointers. Why is this the case?
In a statement such as:
int *p = new int(4);
How can p be considered an iterator at all?
"Iterator" is some abstract concept, describing a certain set of operations a type must support with some specific semantics.
Pointers are iterators because they fulfill the concept iterator (and, even stronger, random access iterator), e.g. the operator++ to move to the next element and operator * to access the underlying element.
In your particular example, you get a standard iterator range with
[p, p+1)
which can be used for example in the standard algorithms, like any iterator pair. (It may not be particularly useful, but it is still valid.) The above holds true for all "valid" pointers, that is pointers that point to some object.
The converse implication however is false: For example, consider the std::list<T>::iterator. That is still an iterator, but it cannot be a pointer because it does not have an operator[].

Why is vector::iterator invalidated upon reallocation?

I don't understand why a vector's iterator should be invalidated when a reallocation happens.
Couldn't this have been prevented simply by storing an offset -- instead of a pointer -- in the iterator?
Why was vector not designed this way?
Just to add a citation to the performance-related justification: when designing C++, Stroustrup thought it was vital that template classes like std::vector approach the performance characteristics of native arrays:
One reason for the emphasis on run-time efficiency...was that I wanted
templates to be efficient enough in time and space to be used for
low-level types such as arrays and lists.
...
Higher-level alternatives -- say, a range-checked array with a size()
operation, a multidimensional array, a vector type with proper numeric
vector operations and copy semantics, etc. -- would be accepted by
users only if their run-time, space, and notational convenience
approached those of built-in arrays.
In other words, the language mechanism supplying parameterized types
should be such that a concerned user should be able to afford to
eliminate the use of arrays in favor of a standard library class.
Bjarne Stroustrup, Design and Evolution of C++, p.342.
Because for iterators to do that, they'd need to store a pointer to the vector object. For each data access, they'd need to follow the pointer to the vector, then follow the pointer therein to the current location of the data array, then add the offset * the element size. That'd be much slower, and need more memory for the size_type member.
Certainly, it's a good compromise sometimes and it would be nice to be able to choose it when wanted, but it's slower and bulkier than (C-style) direct array usage. std::vector was ruthlessly scrutinised for performance when the STL was being introduced, and the normal implementation is optimised for space and speed over this convenience/reliability factor, just as the array-equivalent operator[] is as fast as arrays but less safe than at().
You can add safety by wrapping the standard std::vector<T>::iterator, but you can't add speed by wrapping a extension::vector<T>::safe_iterator. That's a general principle, and explains many C++ design choices.
There are many reasons for these decisions. As others pointed out, the most basic implementation of iterator for a vector is a plain pointer to the element. To be able to handle push_back iterators would have to be modified to handle a pointer into the vector and a position, on access through the operator, the vector pointer would ave to be dereferenced, the pointer to the data obtained and the position added, with an extra dereference.
While that would not be the most efficient implementation, that is not really a limiting factor. The default implementation of iterators in VS/Dinkumware libraries (even in release) are checked iterators, that manage an equivalent amount of information.
The actual problem comes with other mutating operations. Consider inserting/erasing in the middle of the vector. To maintain validity of all iterators, the container would have to track all the instances of iterators and adapt the position field so that they still refer to the same element (that has been displaced by the insertion/removal).
You would need to store both the offset and a pointer to the vector object itself.
As specified, the iterator can just be a pointer, which takes less space.
TL;DR -- because you're trading simple rules for invalidation for far more complicated action-at-a-distance ones.
Please note that "store a pointer to the vector object" would cause new invalidation cases. For example, today swap preserves iterator validity, if a pointer (or reference) to the vector is stored inside iterators, it no longer could. All operations that move the vector metadata itself (vector-of-vectors anyone?) would invalidate iterators.
You trade is "iterator becomes invalid when a pointer/reference to the element is invalidated" for "iterator becomes invalid when a pointer/reference to the vector is invalidated".
The performance arguments don't much matter, because the proposed alternate implementation is not even correct.
I an iterator wasn't invalidated, should it point to the same element or to the same position after an insertion before it? In other words, even if there were no performance issues, it is non-trivial to decide which alternative definition to use.

Use of iterators over array indices

I just wanted to know what is the main advantage of using the iterators over the array indices. I have googled but i am not getting the right answer.
I presume you are talking about when using a vector, right?
The main advantage is that iterator code works for all stl containers, while the array indexing operator [] is only available for vectors and deques. This means you are free to change the underlying container if you need to without having to recode every loop. It also means you can put your iteration code in a template and it will work for any container, not just for deques and vectors (and arrays of course).
All of the standard containers provide the iterator concept. An iterator knows how to find the next element in the container, especially when the underlying structure isn't array-like. Array-style operator[] isn't provided by every container, so getting in the habit of using iterators will make more consistent-looking code, regardless of the container you choose.
You can abstract the collection implementation away.
To expand upon previous answers:
Writing a loop with operator[] constrains you to a container that supports [] and uses the same index/size type. Otherwise you'd need to rewrite every loop to change the container.
Even if your container supports [], it may not be optimal for sequential traversing. [] is basically a random-access operator, which is O(1) for vector but could be as bad as O(n) depending on the underlying container.
This is a minor point, but if you use iterators, your loop could be more easily moved to using the standard algorithms, e.g. std::for_each.
There are many data structures, e.g. hash tables and linked lists cannot be naturally or quickly indexed, but they are indeed traversable. Iterators act as an interface that let you walk on any data structure without knowing the actual implementation of the source.
The STL contains algorithms, such as transform and for_each that operate on containers. They don't accept indices, but use iterators.
Iterators help hide the container implementation and allow the programmer to focus more on the algorithm. The for_each function can be applied to anything that supports a forward iterator.
As well as the points in other answers, iterators can also be faster (specifically compared to operator[]), since they are essentially iteration by pointer. If you do something like:
for (int i = 0; i < 10; ++i)
{
my_vector[i].DoSomething();
}
Every iteration of the loop unnecessarily calculates my_vector.begin() + i. If you use iterators, incrementing the iterator means it's already pointing to the next element, so you don't need that extra calculation. It's a small thing, but can make a difference in tight loops.
One other slight difference is that you can't use erase() on an element in a vector by index, you must have an iterator. No big deal since you can always do "vect.begin() + index" as your iterator, but there are other considerations. For example, if you do this then you must always check your index against size() and not some variable you assigned that value.
None of that is really too much worth worrying about but if given the choice I prefer iterator access for the reasons already stated and this one.
I would say it's more a matter of consistency and code reuse.
Consistency in that you will use all other containers with iterators
Code reuse in that algorithms written for iterators cannot be used with the subscript operator and vice versa... and the STL has lots of algorithms so you definitely want to build on it.
Finally, I'd like to say that even C arrays have iterators.
const Foo* someArray = //...
const Foo* otherArray = new Foo[someArrayLength];
std::copy(someArray, someArray + someArrayLength, otherArray);
The iterator_traits class has been specialized so that pointers or a model of RandomAccessIterator.

STL-Like range, What could go wrong if I did this?

I am writing (as a self-teaching exercise) a simple STL-Like range. It is an Immutable-Random-Access "container". My range, keeps only the its start element, the the number of elements and the step size(the difference between two consecutive elements):
struct range
{
...
private:
value_type m_first_element, m_element_count, m_step;
};
Because my range doesn't hold the elements, it calculates the desired element using the following:
// In the standards, the operator[]
// should return a const reference.
// Because Range doesn't store its elements
// internally, we return a copy of the value.
value_type operator[](size_type index)
{
return m_first_element + m_step*index;
}
As you can see, I am not returning a const reference as the standards say. Now, can I assume that a const reference and a copy of the element are the same in terms of using the non-mutating algorithms in the standard library?
Any advice about the subject is greatly appreciated.
#Steve Jessop: Good point that you mentioned iterators.
Actually, I used sgi as my reference. At the end of that page, it says:
Assuming x and y are iterators from the same range:
Invariants Identity
x == y if and only if &*x == &*y
So, it boils down to the same original question I've asked actually :)
The standard algorithms don't really use operator[], they're all defined in terms of iterators unless I've forgotten something significant. Is the plan to re-implement the standard algorithms on top of operator[] for your "ranges", rather than iterators?
Where non-mutating algorithms do use iterators, they're all defined in terms of *it being assignable to whatever it needs to be assignable to, or otherwise valid for some specified operation or function call. I think all or most such ops are fine with a value.
The one thing I can think of, is that you can't pass a value where a non-const reference is expected. Are there any non-mutating algorithms which require a non-const reference? Probably not, provided that any functor parameters etc. have enough const in them.
So sorry, I can't say definitively that there are no odd corners that go wrong, but it sounds basically OK to me. Even if there are any niggles, you may be able to fix them with very slight differences in the requirements between your versions of the algorithms and the standard ones.
Edit: a second thing that could go wrong is taking pointers/references and keeping them too long. As far as I can remember, standard algorithms don't keep pointers or references to elements - the reason for this is that it's containers which guarantee the validity of pointers to elements, iterator types only tell you when the iterator remains valid (for instance a copy of an input iterator doesn't necessarily remain valid when the original is incremented, whereas forward iterators can be copied in this way for multi-pass algorithms). Since the algorithms don't see containers, only iterators, they have no reason that I can think of to be assuming the elements are persistent.
Items in STL containers are expected to be copied around all the time; think about when a vector has to be reallocated, for example. So, your example is fine, except that it only works with random iterators. But I suspect the latter is probably by design. :-P
Do you want your range to be usable in STL algorithms? Wouldn't it be better off with the first and last elements? (Considering the fact that end() is oft required/used, you will have to pre-calculate it for performance.) Or, are you counting on the contiguous elements (which is my second point)?
Since you put "container" in "quotes" you can do whatever you want.
STL type things (iterators, various member functions on containers..) return references because references are lvalues, and certain constructs (ie, myvec[i] = otherthing) can compile then. Think of operator[] on std::map. For const references, it's not a value just to avoid the copy, I suppose.
This rule is violated all the time though, when convenient. It's also common to have iterator classes store the current value in a member variable purely for the purpose of returning a reference or const reference (yes, this reference would be invalid if the iterator is advanced).
If you're interested in this sort of stuff you should check out the boost iterator library.