Can I use std::upper_bound without an underlying container? - c++

I have a range of integers [start, end] and a non-decreasing monotonic function f(i).
So conceptually, I have a non-decreasing sequence [f(start), f(start + 1), .. , f(end)].
Can I use std::upper_bound on that sequence to find the first element i in the range that holds f(i) > some_value ?
Conceptually, I'd like something like this:
std::upper_bound(start, end + 1, some_value, [&](int lhs, int rhs) {
return f(lhs) < f(rhs);
});
But this doesn't compile because start and end + 1 do not meet the requirements of forward iterators.

The short answer is yes, since std::upper_bound works on iterators, not on containers. But iterators themselves are instances of corresponding class (for example, std::vector<int>::iterator or whatnot).
If you construct some specific class that will meet the requirements of ForwardIterator not being actually bound to some sort of container, while still meaning something (for example, if you want to generate your sequence procedurally), it should work just fine.
Note that simple integer will not do the trick. On the other hand, a class, whose objects hold the value of your function for a particular argument value (with some additional batteries), will.

There are basically two answers:
Would it work by the standard or would it work with all practical implementations of the STL?
By the standard, as T.C. pointed out already, there are some strict requirements on iterators, especially that *it has to return a (possibly const) reference to value_type (which we would satisfy by returning the reference to a member of the iterator), but we also need that for it1 == it2, *it1 and *it2 are references bound to the same object, which is only possible if we have a distinct object for every number in the range.
If you want to do use this idea in practice, I don't believe any implementation of std::upper_bound or similar methods actually relies on this reference equality, so you could just use a class that encapsulates an integer as an iterator, only overloading the necessary methods. As far as I can see, boost::irange fulfills these requirements
As you can see, this is not strictly standard-compliant, but I see no reason why any implementation of binary search should rely on such strong requirements for the iterator, if the underlying 'storage' is const anyway.

No, not practically, but yes in practice, but no if you want to be practical.
No
upper_bound requires ForwardIterator. ForwardIterator requires that * returns an actual reference, and that if two iterators are equal then they refer to the same object.
Not practically
For a container-less iterator, this requires an insanely complex iterator that caches the values it returns in a shared global map of some kind. To make it half practical, note that the iterator requirements say very little about the lifetime of said reference; so you'd want to reference count and destroy said values as the iterators in question cease to exist.
Such a solution requires synchronization, global state, and is significantly more expensive and complex than something like boost::integer_range. No sane person would write this except as an exercise demonstrating why the standard needs to be fixed.
But yes in practice
No sane implementation of upper_bound actually requires that the iterators in question are full-scale forward iterators, barring one that does full concept-checks to validate against the standard (and not against what the actual algorithm needs). Input iterators with stability on the values returned almost certainly does it. There is no such concept in the C++ standard, and forward iterator is the weakest iterator category in the standard that satifies it.
This problem, of effectively demanding iterators be backed by containers, is a flaw in the standard in my opinion. Container-free iterators are powerful and useful, except they rarely technically work in standard containers.
Adding new iterator categories has proved problematic, because there is little way to do it without breaking existing code. They looked into it for contiguous iterators, and wrote it off as impractical (I don't know all the details of what they tried).
Adding new iterator concepts that are not backed by tags is more possible, but probably will have to wait until concepts are part of the C++ language and not just the standard; then experimenting with adding new concepts becomes something you can specify in C++ instead of in standardese, which makes it far easier.
But no if you want to be practical
This does, however, result in an ill-formed program, no diagnostic required. So consider if it is worth it; it may actually be easier to reimplement upper_bound than maintain a program whose every excution is undefined behavior, and every compile at the mercy of a compiler upgrade.

Related

Can iterators from different containers be (re)assigned?

Iterators from different containers cannot be compared (see for example here:https://stackoverflow.com/a/4664519/225186) (or more technically, it doesn't need to make sense.)
This raises another question, can iterator from different ranges be assigned to each other?
Container A = ...;
Container B = ...;
auto it1 = A.begin();
it1 = B.begin(); // it1 first belonged to A, does it1 need to belong to B later
Is the last line required to work by the Iterator concept in some standard or in the accepted practices or in the upcoming std ranges?
Since equality and assigment are so intertwined, it seem that if equality (==) is not always well defined then assignment (=) doesn't need to be well defined either, and probably for similar underlying reasons.
The reason I ask is not purely academic, because a certain iterator implementation could have some "metadata" from the container, and that (depending on the implementation) may or may not be able to be reassigned or simply a waste to be reassigned.
(For example a stride information that is unique to A and doesn't agree with that of B.
Another example is when the iterator stores a reference to the original range.)
This could allow that when assignment is tried, a particular field (member) might be left untouched.
One could make the assignment work in some cases, and that may produce less surprises, but it could also limit the implementations, and the question is would it be really necessary to define/allow the assignment between iterators of different origin (provenance)?
UPDATE 2021:
The linked documents read things like:
[...] the term the domain of == is used in the ordinary mathematical
sense to denote the set of values over which == is (required to be)
defined. This set can change over time. Each algorithm places
additional requirements on the domain of == for the iterator values it
uses. These requirements can be inferred from the uses that algorithm
makes of == and !=.
So there is an implicit defined (by the algorithms) range of validity of ==
.
Another way to formulate this question is if the same caveats for the domain of applicability of == can be applied, by simple use of logic, to =.
The underlying idea is that defining == in isolation of = or vise versa doesn't make sense.
(and also because I found a motivating case).
If you check out cppreference.com for the container you are interested you can find out the requirements for its iterators.
If we look at std::vector for example, its iterators are specified to be LegacyRandomAccessIterator. If you follow the hierarchy of definitions up from there to the base LegacyIterator you'll see that iterators are required to be CopyAssignable which means you must be able to assign one iterator to another of the same type.
All of the standard library containers use iterators derived from LegacyIterator, containers from other libraries or containers are free to ignore these requirements but it would be quite surprising to users if iterators weren't CopyAssignable and even more surprising if iterators were only not CopyAssignable between containers of the same type as that would potentially only be a runtime failure and not a compile time failure.

why does it need it!=obj.end() although every iterator should know itself when to terminate?

the following question just shot through my head. For the c++ stl iterators, a common practice is to have something like:
for (iterator it=obj.begin(); it!=obj.end(); it++)
what I am wondering is actually that obj.begin() could have told "it" when to stop which would make the for loop look like:
for (iterator it=obj.begin(); !it.end(); it++)
The benefit would be to make the iterator more self contained, and one could save (iterator end()) in the container class.
Sometimes you want to do something other than iterate over the entire contents of a container. For example, you can create a pair of iterators that will iterate over only the first half of a container. So having a separate object to represent the end is more flexible since it allows the user more control of where to place the end of a range.
You are right that it is somewhat inconvenient for the most common case of iterating over everything. However, C++11 provides a range based for loop which makes looping over a whole container really easy, so like many things in programming, it's really just a matter of selecting the right construct to best express your intention.
If the iterator API would be designed like that, a pointer wouldn't be a valid iterator (since a pointer obviously wouldn't have an end()) method. So that would rule out the most straight-forward way to implement iterators for data structures with contiguous memory.
Some libraries provide "java-style iterators" that work like you describe.
However, the big problem with this scheme (hasNext(), etc) is that such iterators are classes.
For example, STL contains <algorithm> header that contains functions like std::copy, std::generate, std::sort, std::lower_bound, std::fill etc. All those routines use begin/end style iterators. As a result, you can use pointers with those functions. If those function were operating on iterators that are classes (i.e. if they called hasNext() instead of != end() internally), then you wouldn't be able to pass pointers into std::sort and such. In this scenario you would have to wrap everything into class, wasting your time, and losing access to <algorithm> is not worth minor convenience you'd get by adding atEnd() method to iterator class. That's probably the reason why iterators are compared to the end().
There is no requirement for iterator to "know" about the container. It could just know and care about an offset in a block of memory or a current node (or whatever is appropriate for the data structure being iterated on), without knowing anything about encompassing container. For example, a vector iterator could be implemented as a simple pointer, without knowing (by itself) where the vector ends.
Also, STL algorithms need to work on raw pointers as well, so iterators need to "mimic" pointers.
Ultimately it's just the decision made when the STL was invented (by Stepanov in the early 90s), and later ratified in the C++ standardization process, that an iterator would be a generalization of a pointer. From http://www.sgi.com/tech/stl/stl_introduction.html:
In the example of reversing a C array, the arguments to reverse are
clearly of type double*. What are the arguments to reverse if you are
reversing a vector, though, or a list? ... The answer is that the
arguments to reverse are iterators, which are a generalization of
pointers.
Iterators didn't have to be a generalization of pointers. The C++ standard libraries (and before that the STL) could in theory have used a different iteration model, in which an iteration would be represented either by a single iterator object or a single range object, instead of by a pair of iterators first and last.
I don't think there would be much performance difference. There certainly wouldn't be with modern C++ compilers. The STL (and the standard libraries based on it) always relied for performance on decent inlining by the compiler, and these classes wouldn't be any worse than what compilers already have to deal with in the bowels of the containers. Nor would there be any major difficulty in providing a simple wrapper that turns a pair of pointers into an iterator or range object.
Some people do prefer other iterator models - James Gosling for one (or if not him, whoever designed Java iterators). And some people prefer ranges (including plenty of C++ programmers: hence Boost.Range).
I suspect the authors of the STL and C++ liked the fact that STL-style iterators retained a kind of compatibility with C, that you could take a C algorithm that used pointers to operate on an array (or other range specified by the user with a pair of pointers), and transform it almost unchanged into a C++ algorithm that uses iterators to operate on a container (or other range). That's the kind of thinking that means your new language can be more easily assimilated by an existing user-base, which was one of the goals of C++ early on.
So questions like, "can we provide a means of testing for end which is a few characters shorter, at the cost of making pointers no longer be iterators and making iterators larger" probably would not have got much interest at the time. But that was then, and for example Andrei Alexandrescu has been banging on for a while now that in his opinion this is not longer the best choice, and that ranges are better than iterators.
Having the iterator work the way it does gives you the flexibility to work on subranges. You don't always have to start at begin() or end at end(). For example if you want to work with all the iterators containing a certain value you can use the return value of equal_range() for your beginning and ending.
The D language has a range which incorporates both the beginning and end of an iterator range. It was also proposed for C++: http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/7e52451ed8eea31c
Knowing where your iterator's begin and end (and rbegin and rend) are is certainly useful but the case you state is somewhat handled by C++11's range-based for-loop:
for (auto it: obj)
{
// Do something with *it;
}
It's much easier to get things done in C++11 without writing so much boiler-plate!

How to handle multiple iterator types

I have a custom datastructure, which can be accessed in multiple ways. I want to try to keep this datastructure to keep to STL-standards as well as possible. So I already have lot's of typedefs, which give template parameters the STL-names. This is business as usual for me by now.
However I am not so sure how to correctly add iterators to my datastructure. The main problem I am facing is, that there would be multiple iteration policies over the datastructure. The easiest use case is iterating over all elements, which would be handled well by STL-Conforming iterators over the datastructure. However one might also want to access elements, which are somehow similar to a given key. I would also like to iterate over all these similar elements in a way which I can interface with the STL.
These are the ideas I have thought about so far:
Provide only one type of iterator:
This is basicly what std::map does. The start and end iterators for a subrange are provided by std::map::lower_bound() and std::map::upper_bound().
However this works well, because the iterators returned by begin(), end(), lower_bound() and upper_bound() are compatible, i.e. the operator==() can be given a very well defined meaning on these. In my case this would be hard to get right, or might even be impossible to give some clear semantics. For example I probably would get some cases where it1==it2 but ++it1!=++it2. I am not sure if this is allowed by the STL.
Provide multiple types of iterators:
Much easier to provide clean operator==() semantics. Nasty on the other hand because it enlarges the number of types.
Provide one type of iterator and use stl::algorithms for specialized access
I am not sure if this is possible at all. The iteration state should be kept by the iterator somehow (either directly or in a Memento). Using this approach would mean to specialize all stl::algorithms and access the predicate directly in the specialization. Most likely impossible, and if possible a very bad idea.
Right now I am mostly opting for Version 1, i.e. to provide only one type of iterator at all. However since I am not clear on how to clean up the semantics, I have not yet decided.
How would you handle this?
Standard containers support two iteration policies with two iterator types: ::iterator and ::reverse_iterator. You can convert between the two using the constructor of std::reverse_iterator, and its member function base().
Depending how similar your iteration policies are, it may or may not be easy to provide conversions to different iterator types. The idea is that the result should point at the "equivalent position" in the iteration policy of the destination type. For reverse iterators, this equivalence is defined by saying that if you insert at that point, the result is the same. So if rit is a reverse iterator, vec.insert(rit.base(), ...) inserts an element "before" rit in the reverse iteration, that is to say after the element pointed to by rit in the container. This is quite fiddly, and will only get worse when the iteration policies are completely unrelated. But if all of your iterator types are (or can be made to look like) wrappers around the "normal" iterator that goes over all elements, then you can define conversions in terms of that underlying iterator position.
You only actually need conversions if there are member functions that add or remove elements of the container, because you probably don't want to have to provide a separate overload for each iterator type (just like standard containers don't define insert and erase for reverse iterators). If iterators are used solely to point at elements, then most likely you can do without them.
If the different iteration policies are all iterating in the normal order over a subset of the elements, then look at boost::filter_iterator.
I probably would get some cases where it1==it2 but ++it1!=++it2. I am
not sure if this is allowed by the STL.
If I understand correctly, you got it1 by starting at thing.normal_begin(), and you got it2 by starting at thing.other_policy_begin(). The standard containers don't define the result of comparing iterators of the same type that belong to different ranges, so if you did use a common type, then I think this would be fine provided the documentation makes it clear that although operator== does happen to work, the ranges are separate according to where the iterator came from.
For example, you could have a skip_iterator which takes as a constructor parameter the number of steps it should move forward each time ++ is called. Then you could either include that integer in the comparison, so that thing.skip_begin(1) != thing.skip_begin(2), or you could exclude it so that thing.skip_begin(1) == thing.skip_begin(2) but ++(++(thing.skip_begin(1))) == ++(thing.skip_begin(2)). I think either is fine provided it's documented, or you could document that comparing them is UB unless they came from the same starting point.
Why are more types a problem? It does not necessarily mean much more code. For instance, you could make you iterator-type a template that takes an iteration-policy as template-parameter. The iteration-policy could then provide the implementation of the iteration:
struct iterate_all_policy {
iterate_all_policy(iterator<iterate_all_policy> & it) : it(it) {}
void advance() { /* implement specific advance here */ }
private:
iterator<iterate_all_policy> & it;
}
You will probably have to make the iteration-policy-classes friends of the iterator-types.

Check whether iterator belongs to a list

Is there any way to check whether a given iterator belongs to a given list in C++?
The obvious but invalid approach
You can't simply iterate through the list, comparing each iterator value to your "candidate".
The C++03 Standard is vague about the validity of == applied to iterators from different containers (Mankarse's comment on Nawaz's answer links http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2948.html#446), some compilers (eg. VC++2005 debug mode) warn if you do so, but despite all that it may actually work reliably depending on your compiler/libraries - check its documentation if you don't care about portability.
The C++11 Standard is very explicit, you can't compare iterators to different containers:
ยง 24.2.5 The domain of == for forward iterators is that of iterators over the same underlying sequence.
So, the answers to this question that rely on operator== are questionable now, and invalid in future.
An oft-valid approach
What you can do is iterate along the list, comparing the address of the elements (i.e. &*i) to the address of the object to which your other iterate points.
Mankarse's comment cautions that this might not work as intended for objects providing their own operator&. You could work around this using std::addressof, or for C++03 boost's version
Martin's comment mentions that you have to assume the candidate iterator that you're testing list membership for is safely dereferenceable - i.e. not equal to an end() iterator on the container from which it came. As Steve points out - that's a pretty reasonable precondition and shouldn't surprise anyone.
(This is fine for all Standard containers as stored elements never have the same address, but more generally user-defined containers could allow non-equal iterators to address the same value object (e.g. supporting cycles or a "flyweight pattern" style optimisation), in which case this approach would fail. Still, if you write such a container you're probably in a position to design for safe iterator comparison.)
Implementation:
template <class IteratorA, class IteratorB, class IteratorC>
inline bool range_contains(IteratorA from, const IteratorB& end,
const IteratorC& candidate)
{
while (from != end)
if (&*from++ == &*candidate)
return true;
return false;
}
Notes:
This adopts the Standard library approach of accepting a range of iterator positions to search.
The types of each iterator are allowed to vary, as there are portability issues, e.g. containers where begin() returns an iterator but end() returns a const_iterator.
Iterators other than from are taken by const reference, as iterators can sometimes be non-trivial objects (i.e. too large to fit in a register, relatively expensive to copy). from is needed by value as it will be incremented through the range.

STL-Like range, What could go wrong if I did this?

I am writing (as a self-teaching exercise) a simple STL-Like range. It is an Immutable-Random-Access "container". My range, keeps only the its start element, the the number of elements and the step size(the difference between two consecutive elements):
struct range
{
...
private:
value_type m_first_element, m_element_count, m_step;
};
Because my range doesn't hold the elements, it calculates the desired element using the following:
// In the standards, the operator[]
// should return a const reference.
// Because Range doesn't store its elements
// internally, we return a copy of the value.
value_type operator[](size_type index)
{
return m_first_element + m_step*index;
}
As you can see, I am not returning a const reference as the standards say. Now, can I assume that a const reference and a copy of the element are the same in terms of using the non-mutating algorithms in the standard library?
Any advice about the subject is greatly appreciated.
#Steve Jessop: Good point that you mentioned iterators.
Actually, I used sgi as my reference. At the end of that page, it says:
Assuming x and y are iterators from the same range:
Invariants Identity
x == y if and only if &*x == &*y
So, it boils down to the same original question I've asked actually :)
The standard algorithms don't really use operator[], they're all defined in terms of iterators unless I've forgotten something significant. Is the plan to re-implement the standard algorithms on top of operator[] for your "ranges", rather than iterators?
Where non-mutating algorithms do use iterators, they're all defined in terms of *it being assignable to whatever it needs to be assignable to, or otherwise valid for some specified operation or function call. I think all or most such ops are fine with a value.
The one thing I can think of, is that you can't pass a value where a non-const reference is expected. Are there any non-mutating algorithms which require a non-const reference? Probably not, provided that any functor parameters etc. have enough const in them.
So sorry, I can't say definitively that there are no odd corners that go wrong, but it sounds basically OK to me. Even if there are any niggles, you may be able to fix them with very slight differences in the requirements between your versions of the algorithms and the standard ones.
Edit: a second thing that could go wrong is taking pointers/references and keeping them too long. As far as I can remember, standard algorithms don't keep pointers or references to elements - the reason for this is that it's containers which guarantee the validity of pointers to elements, iterator types only tell you when the iterator remains valid (for instance a copy of an input iterator doesn't necessarily remain valid when the original is incremented, whereas forward iterators can be copied in this way for multi-pass algorithms). Since the algorithms don't see containers, only iterators, they have no reason that I can think of to be assuming the elements are persistent.
Items in STL containers are expected to be copied around all the time; think about when a vector has to be reallocated, for example. So, your example is fine, except that it only works with random iterators. But I suspect the latter is probably by design. :-P
Do you want your range to be usable in STL algorithms? Wouldn't it be better off with the first and last elements? (Considering the fact that end() is oft required/used, you will have to pre-calculate it for performance.) Or, are you counting on the contiguous elements (which is my second point)?
Since you put "container" in "quotes" you can do whatever you want.
STL type things (iterators, various member functions on containers..) return references because references are lvalues, and certain constructs (ie, myvec[i] = otherthing) can compile then. Think of operator[] on std::map. For const references, it's not a value just to avoid the copy, I suppose.
This rule is violated all the time though, when convenient. It's also common to have iterator classes store the current value in a member variable purely for the purpose of returning a reference or const reference (yes, this reference would be invalid if the iterator is advanced).
If you're interested in this sort of stuff you should check out the boost iterator library.