STL-Like range, What could go wrong if I did this? - c++

I am writing (as a self-teaching exercise) a simple STL-Like range. It is an Immutable-Random-Access "container". My range, keeps only the its start element, the the number of elements and the step size(the difference between two consecutive elements):
struct range
{
...
private:
value_type m_first_element, m_element_count, m_step;
};
Because my range doesn't hold the elements, it calculates the desired element using the following:
// In the standards, the operator[]
// should return a const reference.
// Because Range doesn't store its elements
// internally, we return a copy of the value.
value_type operator[](size_type index)
{
return m_first_element + m_step*index;
}
As you can see, I am not returning a const reference as the standards say. Now, can I assume that a const reference and a copy of the element are the same in terms of using the non-mutating algorithms in the standard library?
Any advice about the subject is greatly appreciated.
#Steve Jessop: Good point that you mentioned iterators.
Actually, I used sgi as my reference. At the end of that page, it says:
Assuming x and y are iterators from the same range:
Invariants Identity
x == y if and only if &*x == &*y
So, it boils down to the same original question I've asked actually :)

The standard algorithms don't really use operator[], they're all defined in terms of iterators unless I've forgotten something significant. Is the plan to re-implement the standard algorithms on top of operator[] for your "ranges", rather than iterators?
Where non-mutating algorithms do use iterators, they're all defined in terms of *it being assignable to whatever it needs to be assignable to, or otherwise valid for some specified operation or function call. I think all or most such ops are fine with a value.
The one thing I can think of, is that you can't pass a value where a non-const reference is expected. Are there any non-mutating algorithms which require a non-const reference? Probably not, provided that any functor parameters etc. have enough const in them.
So sorry, I can't say definitively that there are no odd corners that go wrong, but it sounds basically OK to me. Even if there are any niggles, you may be able to fix them with very slight differences in the requirements between your versions of the algorithms and the standard ones.
Edit: a second thing that could go wrong is taking pointers/references and keeping them too long. As far as I can remember, standard algorithms don't keep pointers or references to elements - the reason for this is that it's containers which guarantee the validity of pointers to elements, iterator types only tell you when the iterator remains valid (for instance a copy of an input iterator doesn't necessarily remain valid when the original is incremented, whereas forward iterators can be copied in this way for multi-pass algorithms). Since the algorithms don't see containers, only iterators, they have no reason that I can think of to be assuming the elements are persistent.

Items in STL containers are expected to be copied around all the time; think about when a vector has to be reallocated, for example. So, your example is fine, except that it only works with random iterators. But I suspect the latter is probably by design. :-P

Do you want your range to be usable in STL algorithms? Wouldn't it be better off with the first and last elements? (Considering the fact that end() is oft required/used, you will have to pre-calculate it for performance.) Or, are you counting on the contiguous elements (which is my second point)?

Since you put "container" in "quotes" you can do whatever you want.
STL type things (iterators, various member functions on containers..) return references because references are lvalues, and certain constructs (ie, myvec[i] = otherthing) can compile then. Think of operator[] on std::map. For const references, it's not a value just to avoid the copy, I suppose.
This rule is violated all the time though, when convenient. It's also common to have iterator classes store the current value in a member variable purely for the purpose of returning a reference or const reference (yes, this reference would be invalid if the iterator is advanced).
If you're interested in this sort of stuff you should check out the boost iterator library.

Related

What is std::contiguous_iterator useful for?

For what purposes I can use it?
Why is it better than random_access_iterator?
Is there some advantage if I use it?
For a contiguous iterator you can get a pointer to the element the iterator is "pointing" to, and use it like a pointer to a contiguous array.
That can't be guaranteed with a random access iterator.
Remember that e.g. std::deque is a random-access container, but it's typically not a contiguous container (as opposed to std::vector which is both random access and contiguous).
In C++17, there is no such thing as a std::contiguous_iterator. There is the ContiguousIterator named requirement however. This represents a random access iterator over a sequence of elements where each element is stored contiguously, in exactly the same way as an array. Which means that it is possible, given a pointer to an value_type from an iterator, to perform pointer arithmetic on that pointer, which shall work in exactly the same way as performing the same arithmetic on the corresponding iterators.
The purpose of this is to allow for more efficient implementations of algorithms on iterators that are contiguous. Or to forbid algorithms from being used on iterators that aren't contiguous. One example of where this matters is if you're trying to pass C++ iterators into a C interface which is based on pointers to arrays. You can wrap such interfaces behind generic algorithms, verifying the contiguity of the iterator in the template.
Or at least, you could in theory; in C++17, that wasn't really possible.. The reason being that there was not actually a way to test if an iterator was a ContiguousIterator. There's no way to ask a pointer if doing pointer arithmetic on a pointer to an element from the iterator is legal. And there was no std::contiguous_iterator_category one could use for such iterators (as this could cause compatibility problems). So you couldn't use SFINAE tools to verify that an iterator was contiguous.
C++20's std::contiguous_iterator concept resolves this problem. It also resolves the other problem with contiguous iterators. See, the above explanation for ContiguousIterator's behavior starts with us having a pointer to an element from the range. Well, how did you get that? The obvious method would be to do something like std::addressof(*it), but what if it is the end iterator? The end iterator is not dereference-able, so you can't do that. Basically, even if you know that an iterator is contiguous, how do you go about converting it to the equivalent pointer?
The std::contiguous_iterator concept solves both of these problems. std::to_address is available, which will convert any contiguous iterator into its equivalent pointer value. And there is a traits tag that an iterator must provide to denote that it is in fact a contiguous iterator, just in case the default to_address implementation happens to be valid for a non-contiguous iterator.
A random access iterator only requires a constant time (iterator) + (offset), whereas contiguous iterators have the stronger guarantee that std::addressof(*((iterator) + (offset))) == std::addressof(*(iterator)) + (offset) (disregarding overloaded operator&s).
This basically means that the iterator is a pointer or a light wrapper around a pointer, so it is equivalent to a pointer to its elements, whereas a random access iterator can do more, at the cost of possibly being bulkier and being unable to turn it into a simple pointer.
As a C++20 Concept, I would expect you can use it to specify a different algorithm if the container is contiguous. Perhaps exploiting cache locality.

Can I use std::upper_bound without an underlying container?

I have a range of integers [start, end] and a non-decreasing monotonic function f(i).
So conceptually, I have a non-decreasing sequence [f(start), f(start + 1), .. , f(end)].
Can I use std::upper_bound on that sequence to find the first element i in the range that holds f(i) > some_value ?
Conceptually, I'd like something like this:
std::upper_bound(start, end + 1, some_value, [&](int lhs, int rhs) {
return f(lhs) < f(rhs);
});
But this doesn't compile because start and end + 1 do not meet the requirements of forward iterators.
The short answer is yes, since std::upper_bound works on iterators, not on containers. But iterators themselves are instances of corresponding class (for example, std::vector<int>::iterator or whatnot).
If you construct some specific class that will meet the requirements of ForwardIterator not being actually bound to some sort of container, while still meaning something (for example, if you want to generate your sequence procedurally), it should work just fine.
Note that simple integer will not do the trick. On the other hand, a class, whose objects hold the value of your function for a particular argument value (with some additional batteries), will.
There are basically two answers:
Would it work by the standard or would it work with all practical implementations of the STL?
By the standard, as T.C. pointed out already, there are some strict requirements on iterators, especially that *it has to return a (possibly const) reference to value_type (which we would satisfy by returning the reference to a member of the iterator), but we also need that for it1 == it2, *it1 and *it2 are references bound to the same object, which is only possible if we have a distinct object for every number in the range.
If you want to do use this idea in practice, I don't believe any implementation of std::upper_bound or similar methods actually relies on this reference equality, so you could just use a class that encapsulates an integer as an iterator, only overloading the necessary methods. As far as I can see, boost::irange fulfills these requirements
As you can see, this is not strictly standard-compliant, but I see no reason why any implementation of binary search should rely on such strong requirements for the iterator, if the underlying 'storage' is const anyway.
No, not practically, but yes in practice, but no if you want to be practical.
No
upper_bound requires ForwardIterator. ForwardIterator requires that * returns an actual reference, and that if two iterators are equal then they refer to the same object.
Not practically
For a container-less iterator, this requires an insanely complex iterator that caches the values it returns in a shared global map of some kind. To make it half practical, note that the iterator requirements say very little about the lifetime of said reference; so you'd want to reference count and destroy said values as the iterators in question cease to exist.
Such a solution requires synchronization, global state, and is significantly more expensive and complex than something like boost::integer_range. No sane person would write this except as an exercise demonstrating why the standard needs to be fixed.
But yes in practice
No sane implementation of upper_bound actually requires that the iterators in question are full-scale forward iterators, barring one that does full concept-checks to validate against the standard (and not against what the actual algorithm needs). Input iterators with stability on the values returned almost certainly does it. There is no such concept in the C++ standard, and forward iterator is the weakest iterator category in the standard that satifies it.
This problem, of effectively demanding iterators be backed by containers, is a flaw in the standard in my opinion. Container-free iterators are powerful and useful, except they rarely technically work in standard containers.
Adding new iterator categories has proved problematic, because there is little way to do it without breaking existing code. They looked into it for contiguous iterators, and wrote it off as impractical (I don't know all the details of what they tried).
Adding new iterator concepts that are not backed by tags is more possible, but probably will have to wait until concepts are part of the C++ language and not just the standard; then experimenting with adding new concepts becomes something you can specify in C++ instead of in standardese, which makes it far easier.
But no if you want to be practical
This does, however, result in an ill-formed program, no diagnostic required. So consider if it is worth it; it may actually be easier to reimplement upper_bound than maintain a program whose every excution is undefined behavior, and every compile at the mercy of a compiler upgrade.

why does it need it!=obj.end() although every iterator should know itself when to terminate?

the following question just shot through my head. For the c++ stl iterators, a common practice is to have something like:
for (iterator it=obj.begin(); it!=obj.end(); it++)
what I am wondering is actually that obj.begin() could have told "it" when to stop which would make the for loop look like:
for (iterator it=obj.begin(); !it.end(); it++)
The benefit would be to make the iterator more self contained, and one could save (iterator end()) in the container class.
Sometimes you want to do something other than iterate over the entire contents of a container. For example, you can create a pair of iterators that will iterate over only the first half of a container. So having a separate object to represent the end is more flexible since it allows the user more control of where to place the end of a range.
You are right that it is somewhat inconvenient for the most common case of iterating over everything. However, C++11 provides a range based for loop which makes looping over a whole container really easy, so like many things in programming, it's really just a matter of selecting the right construct to best express your intention.
If the iterator API would be designed like that, a pointer wouldn't be a valid iterator (since a pointer obviously wouldn't have an end()) method. So that would rule out the most straight-forward way to implement iterators for data structures with contiguous memory.
Some libraries provide "java-style iterators" that work like you describe.
However, the big problem with this scheme (hasNext(), etc) is that such iterators are classes.
For example, STL contains <algorithm> header that contains functions like std::copy, std::generate, std::sort, std::lower_bound, std::fill etc. All those routines use begin/end style iterators. As a result, you can use pointers with those functions. If those function were operating on iterators that are classes (i.e. if they called hasNext() instead of != end() internally), then you wouldn't be able to pass pointers into std::sort and such. In this scenario you would have to wrap everything into class, wasting your time, and losing access to <algorithm> is not worth minor convenience you'd get by adding atEnd() method to iterator class. That's probably the reason why iterators are compared to the end().
There is no requirement for iterator to "know" about the container. It could just know and care about an offset in a block of memory or a current node (or whatever is appropriate for the data structure being iterated on), without knowing anything about encompassing container. For example, a vector iterator could be implemented as a simple pointer, without knowing (by itself) where the vector ends.
Also, STL algorithms need to work on raw pointers as well, so iterators need to "mimic" pointers.
Ultimately it's just the decision made when the STL was invented (by Stepanov in the early 90s), and later ratified in the C++ standardization process, that an iterator would be a generalization of a pointer. From http://www.sgi.com/tech/stl/stl_introduction.html:
In the example of reversing a C array, the arguments to reverse are
clearly of type double*. What are the arguments to reverse if you are
reversing a vector, though, or a list? ... The answer is that the
arguments to reverse are iterators, which are a generalization of
pointers.
Iterators didn't have to be a generalization of pointers. The C++ standard libraries (and before that the STL) could in theory have used a different iteration model, in which an iteration would be represented either by a single iterator object or a single range object, instead of by a pair of iterators first and last.
I don't think there would be much performance difference. There certainly wouldn't be with modern C++ compilers. The STL (and the standard libraries based on it) always relied for performance on decent inlining by the compiler, and these classes wouldn't be any worse than what compilers already have to deal with in the bowels of the containers. Nor would there be any major difficulty in providing a simple wrapper that turns a pair of pointers into an iterator or range object.
Some people do prefer other iterator models - James Gosling for one (or if not him, whoever designed Java iterators). And some people prefer ranges (including plenty of C++ programmers: hence Boost.Range).
I suspect the authors of the STL and C++ liked the fact that STL-style iterators retained a kind of compatibility with C, that you could take a C algorithm that used pointers to operate on an array (or other range specified by the user with a pair of pointers), and transform it almost unchanged into a C++ algorithm that uses iterators to operate on a container (or other range). That's the kind of thinking that means your new language can be more easily assimilated by an existing user-base, which was one of the goals of C++ early on.
So questions like, "can we provide a means of testing for end which is a few characters shorter, at the cost of making pointers no longer be iterators and making iterators larger" probably would not have got much interest at the time. But that was then, and for example Andrei Alexandrescu has been banging on for a while now that in his opinion this is not longer the best choice, and that ranges are better than iterators.
Having the iterator work the way it does gives you the flexibility to work on subranges. You don't always have to start at begin() or end at end(). For example if you want to work with all the iterators containing a certain value you can use the return value of equal_range() for your beginning and ending.
The D language has a range which incorporates both the beginning and end of an iterator range. It was also proposed for C++: http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/7e52451ed8eea31c
Knowing where your iterator's begin and end (and rbegin and rend) are is certainly useful but the case you state is somewhat handled by C++11's range-based for-loop:
for (auto it: obj)
{
// Do something with *it;
}
It's much easier to get things done in C++11 without writing so much boiler-plate!

Check whether iterator belongs to a list

Is there any way to check whether a given iterator belongs to a given list in C++?
The obvious but invalid approach
You can't simply iterate through the list, comparing each iterator value to your "candidate".
The C++03 Standard is vague about the validity of == applied to iterators from different containers (Mankarse's comment on Nawaz's answer links http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2948.html#446), some compilers (eg. VC++2005 debug mode) warn if you do so, but despite all that it may actually work reliably depending on your compiler/libraries - check its documentation if you don't care about portability.
The C++11 Standard is very explicit, you can't compare iterators to different containers:
ยง 24.2.5 The domain of == for forward iterators is that of iterators over the same underlying sequence.
So, the answers to this question that rely on operator== are questionable now, and invalid in future.
An oft-valid approach
What you can do is iterate along the list, comparing the address of the elements (i.e. &*i) to the address of the object to which your other iterate points.
Mankarse's comment cautions that this might not work as intended for objects providing their own operator&. You could work around this using std::addressof, or for C++03 boost's version
Martin's comment mentions that you have to assume the candidate iterator that you're testing list membership for is safely dereferenceable - i.e. not equal to an end() iterator on the container from which it came. As Steve points out - that's a pretty reasonable precondition and shouldn't surprise anyone.
(This is fine for all Standard containers as stored elements never have the same address, but more generally user-defined containers could allow non-equal iterators to address the same value object (e.g. supporting cycles or a "flyweight pattern" style optimisation), in which case this approach would fail. Still, if you write such a container you're probably in a position to design for safe iterator comparison.)
Implementation:
template <class IteratorA, class IteratorB, class IteratorC>
inline bool range_contains(IteratorA from, const IteratorB& end,
const IteratorC& candidate)
{
while (from != end)
if (&*from++ == &*candidate)
return true;
return false;
}
Notes:
This adopts the Standard library approach of accepting a range of iterator positions to search.
The types of each iterator are allowed to vary, as there are portability issues, e.g. containers where begin() returns an iterator but end() returns a const_iterator.
Iterators other than from are taken by const reference, as iterators can sometimes be non-trivial objects (i.e. too large to fit in a register, relatively expensive to copy). from is needed by value as it will be incremented through the range.

Use of iterators over array indices

I just wanted to know what is the main advantage of using the iterators over the array indices. I have googled but i am not getting the right answer.
I presume you are talking about when using a vector, right?
The main advantage is that iterator code works for all stl containers, while the array indexing operator [] is only available for vectors and deques. This means you are free to change the underlying container if you need to without having to recode every loop. It also means you can put your iteration code in a template and it will work for any container, not just for deques and vectors (and arrays of course).
All of the standard containers provide the iterator concept. An iterator knows how to find the next element in the container, especially when the underlying structure isn't array-like. Array-style operator[] isn't provided by every container, so getting in the habit of using iterators will make more consistent-looking code, regardless of the container you choose.
You can abstract the collection implementation away.
To expand upon previous answers:
Writing a loop with operator[] constrains you to a container that supports [] and uses the same index/size type. Otherwise you'd need to rewrite every loop to change the container.
Even if your container supports [], it may not be optimal for sequential traversing. [] is basically a random-access operator, which is O(1) for vector but could be as bad as O(n) depending on the underlying container.
This is a minor point, but if you use iterators, your loop could be more easily moved to using the standard algorithms, e.g. std::for_each.
There are many data structures, e.g. hash tables and linked lists cannot be naturally or quickly indexed, but they are indeed traversable. Iterators act as an interface that let you walk on any data structure without knowing the actual implementation of the source.
The STL contains algorithms, such as transform and for_each that operate on containers. They don't accept indices, but use iterators.
Iterators help hide the container implementation and allow the programmer to focus more on the algorithm. The for_each function can be applied to anything that supports a forward iterator.
As well as the points in other answers, iterators can also be faster (specifically compared to operator[]), since they are essentially iteration by pointer. If you do something like:
for (int i = 0; i < 10; ++i)
{
my_vector[i].DoSomething();
}
Every iteration of the loop unnecessarily calculates my_vector.begin() + i. If you use iterators, incrementing the iterator means it's already pointing to the next element, so you don't need that extra calculation. It's a small thing, but can make a difference in tight loops.
One other slight difference is that you can't use erase() on an element in a vector by index, you must have an iterator. No big deal since you can always do "vect.begin() + index" as your iterator, but there are other considerations. For example, if you do this then you must always check your index against size() and not some variable you assigned that value.
None of that is really too much worth worrying about but if given the choice I prefer iterator access for the reasons already stated and this one.
I would say it's more a matter of consistency and code reuse.
Consistency in that you will use all other containers with iterators
Code reuse in that algorithms written for iterators cannot be used with the subscript operator and vice versa... and the STL has lots of algorithms so you definitely want to build on it.
Finally, I'd like to say that even C arrays have iterators.
const Foo* someArray = //...
const Foo* otherArray = new Foo[someArrayLength];
std::copy(someArray, someArray + someArrayLength, otherArray);
The iterator_traits class has been specialized so that pointers or a model of RandomAccessIterator.