Just a little introduction, with simple words.
In C++, iterators are "things" on which you can write at least the dereference operator *it, the increment operator ++it, and for more advanced bidirectional iterators, the decrement --it, and last but not least, for random access iterators we need operator index it[] and possibly addition and subtraction.
Such "things" in C++ are objects of types with the according operator overloads, or plain and simple pointers.
std::vector<> is a container class that wraps a continuous array, so pointer as iterator makes sense. On the nets, and in some literature you can find vector.begin() used as a pointer.
The rationale for using a pointer is less overhead, higher performance, especially if an optimizing compiler detects iteration and does its thing (vector instructions and stuff). Using iterators might be harder for the compiler to optimize.
Knowing this, my question is why modern STL implementations, let's say MSVC++ 2013 or libstdc++ in Mingw 4.7, use a special class for vector iterators?
You're completely correct that vector::iterator could be implemented by a simple pointer (see here) -- in fact the concept of an iterator is based on that of a pointer to an array element. For other containers, such as map, list, or deque, however, a pointer won't work at all. So why is this not done? Here are three reasons why a class implementation is preferrable over a raw pointer.
Implementing an iterator as separate type allows additional functionality (beyond what is required by the standard), for example (added in edit following Quentins comment) the possibility to add assertions when dereferencing an iterator, for example, in debug mode.
overload resolution If the iterator were a pointer T*, it could be passed as valid argument to a function taking T*, while this would not be possible with an iterator type. Thus making std::vector<>::iterator a pointer in fact changes the behaviour of existing code. Consider, for example,
template<typename It>
void foo(It begin, It end);
void foo(const double*a, const double*b, size_t n=0);
std::vector<double> vec;
foo(vec.begin(), vec.end()); // which foo is called?
argument-dependent lookup (ADL; pointed out by juanchopanza) If you make an unqualified call, ADL ensures that functions in namespace std will be searched only if the arguments are types defined in namespace std. So,
std::vector<double> vec;
sort(vec.begin(), vec.end()); // calls std::sort
sort(vec.data(), vec.data()+vec.size()); // fails to compile
std::sort is not found if vector<>::iterator were a mere pointer.
The implementation of the iterator is implementation defined, so long as fulfills the requirements of the standard. It could be a pointer for vector, that would work. There are several reasons for not using a pointer;
consistency with other containers.
debug and error checking support
overload resolution, class based iterators allow for overloads to work differentiating them from plain pointers
If all the iterators were pointers, then ++it on a map would not increment it to the next element since the memory is not required to be not-contiguous. Past the contiguous memory of std:::vector most standard containers require "smarter" pointers - hence iterators.
The physical requirement's of the iterator dove-tail very well with the logical requirement that movement between elements it a well defined "idiom" of iterating over them, not just moving to the next memory location.
This was one of the original design requirements and goals of the STL; the orthogonal relationship between the containers, the algorithms and connecting the two through the iterators.
Now that they are classes, you can add a whole host of error checking and sanity checks to debug code (and then remove it for more optimised release code).
Given the positive aspects class based iterators bring, why should or should you not just use pointers for std::vector iterators - consistency. Early implementations of std::vector did indeed use plain pointers, you can use them for vector. Once you have to use classes for the other iterators, given the positives they bring, applying that to vector becomes a good idea.
The rationale for using a pointer is less overhead, higher
performance, especially if an optimizing compiler detects iteration
and does its thing (vector instructions and stuff). Using iterators
might be harder for the compiler to optimize.
It might be, but it isn't. If your implementation is not utter shite, a struct wrapping a pointer will achieve the same speed.
With that in mind, it's simple to see that simple benefits like better diagnostic messages (naming the iterator instead of T*), better overload resolution, ADL, and debug checking make the struct a clear winner over the pointer. The raw pointer has no advantages.
The rationale for using a pointer is less overhead, higher
performance, especially if an optimizing compiler detects iteration
and does its thing (vector instructions and stuff). Using iterators
might be harder for the compiler to optimize.
This is the misunderstanding at the heart of the question. A well formed class implementation will have no overhead, and identical performance all because the compiler can optimize away the abstraction and treat the iterator class as just a pointer in the case of std::vector.
That said,
MSVC++ 2013 or libstdc++ in Mingw 4.7, use a special class for vector
iterators
because they view that adding a layer of abstraction class iterator to define the concept of iteration over a std::vector is more beneficial than using an ordinary pointer for this purpose.
Abstractions have a different set of costs vs benefits, typically added design complexity (not necessarily related to performance or overhead) in exchange for flexibility, future proofing, hiding implementation details. The above compilers decided this added complexity is an appropriate cost to pay for the benefits of having an abstraction.
Because STL was designed with the idea that you can write something that iterates over an iterator, no matter whether that iterator's just equivalent to a pointer to an element of memory-contiguous arrays (like std::array or std::vector) or something like a linked list, a set of keys, something that gets generated on the fly on access etc.
Also, don't be fooled: In the vector case, dereferencing might (without debug options) just break down to a inlinable pointer dereference, so there wouldn't even be overhead after compilation!
I think the reason is plain and simple: originally std::vector was not required to be implemented over contiguous blocks of memory.
So the interface could not just present a pointer.
source: https://stackoverflow.com/a/849190/225186
This was fixed later and std::vector was required to be in contiguous memory, but it was probably too late to make std::vector<T>::iterator a pointer.
Maybe some code already depended on iterator to be a class/struct.
Interestingly, I found implementations of std::vector<T>::iterator where this is valid and generated a "null" iterators (just like a null pointer) it = {};.
std::vector<double>::iterator it = {};
assert( &*it == nullptr );
Also, std::array<T>::iterator and std::initializer_list<T>::iterator are pointers T* in the implementations I saw.
A plain pointer as std::vector<T>::iterator would be perfectly fine in my opinion, in theory.
In practice, being a built-in has observable effects for metaprogramming, (e.g. std::vector<T>::iterator::difference_type wouldn't be valid, yes, one should have used iterator_traits).
Not-being a raw pointer has the (very) marginal advantage of disallowing nullability (it == nullptr) or default conductibility if you are into that. (an argument that doesn't matter for a generic programming point of view.)
At the same time the dedicated class iterators had a steep cost in other metaprogramming aspects, because if ::iterator were a pointer one wouldn't need to have ad hoc methods to detect contiguous memory (see contiguous_iterator_tag in https://en.cppreference.com/w/cpp/iterator/iterator_tags) and generic code over vectors could be directly forwarded to legacy C-functions.
For this reason alone I would argue that iterator-not-being-a-pointer was a costly mistake. It just made it hard to interact with C-code (as you need another layer of functions and type detection to safely forward stuff to C).
Having said this, I think we could still make things better by allowing automatic conversions from iterators to pointers and perhaps explicit (?) conversions from pointer to vector::iterators.
I got around this pesky obstacle by dereferencing and immediately referencing the iterator again. It looks ridiculous, but it satisfies MSVC...
class Thing {
. . .
};
void handleThing(Thing* thing) {
// do stuff
}
vector<Thing> vec;
// put some elements into vec now
for (auto it = vec.begin(); it != vec.end(); ++it)
// handleThing(it); // this doesn't work, would have been elegant ..
handleThing(&*it); // this DOES work
Related
C++17 introduced the concept of ContiguousIterator http://en.cppreference.com/w/cpp/iterator.
However it doesn't seem that there are plans to have a contiguous_iterator_tag (in the same way we now have random_access_iterator_tag) reported by std::iterator_traits<It>::iterator_category.
Why is contiguous_iterator_tag missing?
Is there a conventional protocol to determine if an iterator is Contiguous?
Or a compile time test?
In the past I mentioned that for containers if there is a .data() member that converts to a pointer to ::value type and there is .size() member convertible to pointer differences, then one should assume that the container is contiguous, but I can't pull an analogous feature of iterators.
One solution could be to have also a data function for contiguous iterators.
Of course the Contiguous concept works if &(it[n]) == (&(*it)) + n, for all n, but this can't be checked at compile time.
EDIT: I found this video which puts this in the more broader context of C++ concepts. CppCon 2016: "Building and Extending the Iterator Hierarchy in a Modern, Multicore World" by Patrick Niedzielski. The solution uses concepts (Lite) but at the end the idea is that contiguous iterators should implement a pointer_from function (same as my data(...) function).
The conclusion is that concepts will help formalizing the theory, but they are not magic, in the sense that someone, somewhere will define new especially named functions over iterators that are contiguous.
The talk generalizes to segmented iterators (with corresponding functions segment and local), unfortunatelly it doesn't say anything about strided pointers.
EDIT 2020:
The standard now has
struct contiguous_iterator_tag: public random_access_iterator_tag { };
https://en.cppreference.com/w/cpp/iterator/iterator_tags
Original answer
The rationale is given in N4284, which is the adopted version of the contiguous iterators proposal:
This paper introduces the term "contiguous iterator" as a refinement of random-access iterator, without introducing a corresponding contiguous_iterator_tag, which was found to break code during the Issaquah discussions of Nevin Liber's paper N3884 "Contiguous Iterators: A Refinement of Random Access Iterators".
Some code was broken because it assumed that std::random_access_iterator couldn't be refined, and had explicit checks against it. Basically it broke bad code that didn't rely on polymorphism to check for the categories of iterators, but it broke code nonetheless, so std::contiguous_iterator_tag was removed from the proposal.
Also, there was an additional problem with std::reverse_iterator-like classes: a reversed contiguous iterator can't be a contiguous iterator, but a regular random-access iterator. This problem could have been solved for std::reverse_iterator, but more user-defined iterator wrappers that augment an iterator while copying its iterator category would have either lied or stopped working correctly (for example Boost iterator adaptors).
C++20 update
Since my original answer above, std::contiguous_iterator_tag was brought back in the Ranges TS, then adopted in C++20. In order to avoid the issues mentioned above, the behaviour of std::iterator_traits<T>::iterator_category was not changed. Instead, user specializations of std::iterator_traits can now define an additional iterator_concept member type alias which is allowed to alias std::contiguous_iterator_tag or the previous iterator tags. The standard components has been updated accordingly in order to mark pointers and appropriate iterators a contiguous iterators.
The standard defines an exposition-only ITER_CONCEPT(Iter) which, given an iterator type Iter, will alias std::iterator_traits<Iter>::iterator_concept if it exists and std::iterator_traits<Iter>::iterator_category otherwise. There is no equivalent standard user-facing type trait, but ITER_CONCEPT is used by the new iterator concepts. It is a strong hint that you should use these iterator concepts instead of old-style tag dispatch to implement new functions whose behaviour depends on the iterator category. That said concepts are usable as boolean traits, so you can simply check that an iterator is a contiguous iterator as follows:
static_assert(std::contiguous_iterator<Iter>);
std::contiguous_iterator is thus the C++20 concept that you should use to detect that a given iterator is a random-access iterator (it also has a ranges counterpart: std::contiguous_range). It is worth noting that std::contiguous_iterator has a few additional constraints besides requiring that ITER_CONCEPT matches std::contiguous_iterator_tag: most notably it requires std::to_address(it) to be a valid expression returning a raw pointer type. std::to_address is a small utility function meant to avoid a few pitfalls that can occur when trying to retrieve the address where a contiguous iterator points - you can read more about the issues it solves in Helpful pointers for ContiguousIterator.
About implementing ITER_CONCEPT: it is worth nothing that implementing ITER_CONCEPT as described in the standard is only possible if you are the owner of std::iterator_traits because it requires detecting whether the primary instantiation of std::iterator_traits is used.
Angew made a comment that a vector using a raw pointer as it's iterator type was fine. That kinda threw me for a loop.
I started researching it and found that the requirement for vector iterators was only that they are "Random Access Iterators" for which it is explicitly stated that pointers qualify:
A pointer to an element of an array satisfies all requirements
Is the only reason that compilers even provide iterators to vector for debugging purposes, or is there actually a requirement I missed on vector?
§ 24.2.1
Since iterators are an abstraction of pointers, their semantics is a generalization of most of the semantics
of pointers in C++. This ensures that every function template that takes iterators works as well with
regular pointers.
So yes, using a pointer satisfies all of the requirements for a Random Access Iterator.
std::vector likely provides iterators for a few reasons
The standard says it should.
It would be odd if containers such as std::map or std::set provided iterators while std::vector provided only a value_type* pointer. Iterators provide consistency across the containers library.
It allows for specializations of the vector type eg, std::vector<bool> where a value_type* pointer would not be a valid iterator.
My 50 cents:
Iterators are generic ways to access any STL container. What I feel you're saying is: Since pointers are OK as a replacement of iterators for vectors, why are there iterators for vectors?
Well, who said you can't have duplicates in C++? Actually it's a good thing to have different interfaces to the same functionality. That should not be a problem.
On the other hand, think about libraries that have algorithms that use iterators. If vectors don't have iterators, it's just an invitation to exceptions (exceptions in the linguistic since, not programming sense). Every time one has to write an algorithm, he must do something different for vectors with pointers. But why? No reason for this hassle. Just interface everything the same way.
What those comments are saying is that
template <typename T, ...>
class vector
{
public:
typedef T* iterator;
typedef const T* const_iterator;
...
private:
T* elems; // pointer to dynamic array
size_t count;
...
}
is valid. Similarly a user defined container intended for use with std:: algorithms can do that. Then when a template asks for Container::iterator the type it gets back in that instantiation is T*, and that behaves properly.
So the standard requires that vector has a definition for vector::iterator, and you use that in your code. On one platform it is implemented as a pointer into an array, but on a different platform it is something else. Importantly these things behave the same way in all the aspects that the standard specifies.
In Herb Sutter's When Is a Container Not a Container?, he shows an example of taking a pointer into a container:
// Example 1: Is this code valid? safe? good?
//
vector<char> v;
// ...
char* p = &v[0];
// ... do something with *p ...
Then follows it up with an "improvement":
// Example 1(b): An improvement
// (when it's possible)
//
vector<char> v;
// ...
vector<char>::iterator i = v.begin();
// ... do something with *i ...
But doesn't really provide a convincing argument:
In general, it's not a bad guideline to prefer using iterators instead
of pointers when you want to point at an object that's inside a
container. After all, iterators are invalidated at mostly the same
times and the same ways as pointers, and one reason that iterators
exist is to provide a way to "point" at a contained object. So, if you
have a choice, prefer to use iterators into containers.
Unfortunately, you can't always get the same effect with iterators
that you can with pointers into a container. There are two main
potential drawbacks to the iterator method, and when either applies we
have to continue to use pointers:
You can't always conveniently use an iterator where you can use a pointer. (See example below.)
Using iterators might incur extra space and performance overhead, in cases where the iterator is an object and not just a bald
pointer.
In the case of a vector, the iterator is just a RandomAccessIterator. For all intents and purposes this is a thin wrapper over a pointer. One implementation even acknowledges this:
// This iterator adapter is 'normal' in the sense that it does not
// change the semantics of any of the operators of its iterator
// parameter. Its primary purpose is to convert an iterator that is
// not a class, e.g. a pointer, into an iterator that is a class.
// The _Container parameter exists solely so that different containers
// using this template can instantiate different types, even if the
// _Iterator parameter is the same.
Furthermore, the implementation stores a member value of type _Iterator, which is pointer or T*. In other words, just a pointer. Furthermore, the difference_type for such a type is std::ptrdiff_t and the operations defined are just thin wrappers (i.e., operator++ is ++_pointer, operator* is *_pointer) and so on.
Following Sutter's argument, this iterator class provides no benefits over pointers, only drawbacks. Am I correct?
For vectors, in non-generic code, you're mostly correct.
The benefit is that you can pass a RandomAccessIterator to a whole bunch of algorithms no matter what container the iterator iterates, whether that container has contiguous storage (and thus pointer iterators) or not. It's an abstraction.
(This abstraction, among other things, allows implementations to swap out the basic pointer implementation for something a little more sexy, like range-checked iterators for debug use.)
It's generally considered to be a good habit to use iterators unless you really can't. After all, habit breeds consistency, and consistency leads to maintainability.
Iterators are also self-documenting in a way that pointers are not. What does a int* point to? No idea. What does an std::vector<int>::iterator point to? Aha…
Finally, they provide a measure a type safety — though such iterators may only be thin wrappers around pointers, they needn't be pointers: if an iterator is a distinct type rather than a type alias, then you won't be accidentally passing your iterator into places you didn't want it to go, or setting it to "NULL" accidentally.
I agree that Sutter's argument is about as convincing as most of his other arguments, i.e. not very.
You can't always conveniently use an iterator where you can use a pointer
That is not one of the disadvantages. Sometimes it is just too "convenient" to get the pointer passed to places where you really didn't want them to go. Having a separate type helps in validating parameters.
Some early implementations used T* for vector::iterator, but it caused various problems, like people accidentally passing an unrelated pointer to vector member functions. Or assigning NULL to the iterator.
Using iterators might incur extra space and performance overhead, in cases where the iterator is an object and not just a bald pointer.
This was written in 1999, when we also believed that code in <algorithm> should be optimized for different container types. Not much later everyone was surprised to see that the compilers figured that out themselves. The generic algorithms using iterators worked just fine!
For a std::vector there is absolutely no space of time overhead for using an iterator instead of a pointer. You found out that the iterator class is just a thin wrapper over a pointer. Compilers will also see that, and generate equivalent code.
One real-life reason to prefer iterators over pointers is that they can be implemented as checked iterators in debug builds and help you catch some nasty problems early. I.e:
vector<int>::iterator it; // uninitialized iterator
it++;
or
for (it = vec1.begin(); it != vec2.end(); ++it) // different containers
Is it possible in C++11 to accumulate many std::vectors, each returned by given a function (whose API I cannot change), into a std container without copying any element?
std::vector<int> make_vect();
container acc; // what is container?
do {
acc.append(std::move(make_vect())); // how to implement this?
} while(acc.size() < n);
Note 1 that elements must not be copied even if they have no move constructor of move assignment operator, such as int in the example. So you can move a chunk of elements (by copying a pointer), but not individual elements.
Note 2 that container must allow for iteration through all accumulated elements using a single iterator. So std::vector<std::vector<>> or similar is not allowed.
Clearly, it is straightforward to write some container allowing this or to use std::list<std::vector<>> and provide your own iterator, but does the std library provide the desired functionality without such user-written additions?
It seems that the requested functionality is nothing particularly outlandish and I'm surprised how hard (if not impossible) it is even with C++11.
TL;DR I don't think a stock Standard Container can do what you ask. Here's why.
Remember that move semantics for containers are efficient because they are implemented as scope-bound handles to dynamically allocated memory. Moving a container is implemented as copying the handle, while not touching the contained elements.
Your first (explicit) constraint is not to copy any element of any of the containers. That necessitates copying the handles into your putative acc_container. In other words, you want a acc_container<std::vector<T>>. Any of the Standard Containers will allow you to do that efficiently no matter how big individual T elements.
Your second (implicit, inferred from the comments) constraint is that you want to have a uniform interface over all the elements of all individual vectors. In other words, you would like to use this as acc_container<T>. This requires an extra level of indirection in the iterators of your acc_container, where the iterator detects it has reached the end of one of the current vector<T>, and jump to the beginning of the next vector<T>.
Such a container does not exist in the Standard Library.
The easiest work-around is to use a std::vector<std::vector<T>> (to avoid copying T elements), and write your own iterator adaptors (e.g. using boost::indirect_iterator, to provide iteration over T elements).
Unfortunately, even if you provide non-member begin() / end() functions that initialize these indirect iterators from the member .begin()/.end(), range-for will not use ADL to look up these functions because it will prefere the old member functions .begin() / .end(). Furthermore, you will not be able to e.g. insert() a T element directly into your compound container, unless you also provide non-member insert() (and similarly for other functionality).
So if you want a bona fide compound container, with range-for support and a native interface of member functions, you need to write one yourself (with std::vector<std::vector<T> as back-end, and the entire std::vector<T> interface written on top of it). Maybe you can suggest it on the Boost mailinglist as a nice project.
UPDATE: here's a link to an old paper by Matt Austern on Segmented Iterators and Hierarchial Algorithms that shows some performance benefits from this approach. The downside is that you also need to make standard algorithms aware of these iterators.
Most of the time, STL iterators are CopyConstructable, because several STL algorithms require this to improve performance, such as std::sort.
However, I've been working on a pet project to wrap the FindXFile API (previously asked about), but the problem is it's impossible to implement a copyable iterator around this API. A find handle cannot be duplicated by any means -- DuplicateHandle specifically forbids passing these types of handles to it. And if you just maintain a reference count to the find handle, then a single increment by any copy results in an increment of all copies -- clearly that is not what a copy constructed iterator is supposed to do.
Since I can't satisfy the traditional copy constructible requirement for iterators here, is it even worth trying to create an "STL style" iterator? On one hand, creating some other enumeration method is going to not fall into normal STL conventions, but on the other, following STL conventions are going to confuse users of this iterator if they try to CopyConstruct it later.
Which is the lesser of two evils?
An input iterator which is not a forward iterator is copyable, but you can only "use" one of the copies: incrementing any of them invalidates the others (dereferencing one of them does not invalidate the others). This allows it to be passed to algorithms, but the algorithm must complete with a single pass. You can tell which algorithms are OK by checking their requirements - for example copy requires only an InputIterator, whereas adjacent_find requires a ForwardIterator (first one I found).
It sounds to me as though this describes your situation. Just copy the handle (or something which refcounts the handle), without duplicating it.
The user has to understand that it's only an InputIterator, but in practice this isn't a big deal. istream_iterator is the same, and for the same reason.
With the benefit of C++11 hindsight, it would almost have made sense to require InputIterators to be movable but not to require them to be copyable, since duplicates have limited use anyway. But that's "limited use", not "no use", and anyway it's too late now to remove functionality from InputIterator, considering how much code relies on the existing definition.