What's wrong with passing C++ iterator by reference?

What's wrong with passing C++ iterator by reference? - c++

I've written a few functions with a prototype like this:
template <typename input_iterator>
int parse_integer(input_iterator &begin, input_iterator end);
The idea is that the caller would provide a range of characters, and the function would interpret the characters as an integer value and return it, leaving begin at one past the last-used character. For example:
std::string sample_text("123 foo bar");
std::string::const_iterator p(sample_text.begin());
std::string::const_iterator end(sample_text.end());
int i = parse_integer(p, end);
This would leave i set to 123 and p "pointing" at the space before foo.
I've since been told (without explanation) that it's bad form to pass an iterator by reference. Is it bad form? If so, why?

There is nothing really wrong, but it will certainly limit the use of the template. You won't be able to just put an iterator returned by something else or generated like v.begin(), since those will be temporaries. You will always first have to make a local copy, which is some kind of boilerplate not really nice to have.
One way is to overload it:
int parse_integer(input_iterator begin, input_iterator end,
input_iterator &newbegin);
template<typename input_iterator>
int parse_integer(input_iterator begin, input_iterator end) {
return parse_integer(begin, end, begin);
}
Another option is to have an output iterator where the number will be written into:
template<typename input_iterator, typename output_iterator>
input_iterator parse_integer(input_iterator begin, input_iterator end,
output_iterator out);
You will have the return value to return the new input iterator. And you could then use a inserter iterator to put the parsed numbers into a vector or a pointer to put them directly into an integer or an array thereof if you already know the amount of numbers.
int i;
b = parse_integer(b, end, &i);
std::vector<int> numbers;
b = parse_integer(b, end, std::back_inserter(numbers));

In general:
If you pass a non-const reference, the caller doesn't know if the iterator is being modified.
You could pass a const reference, but usually iterators are small enough that it gives no advantage over passing by value.
In your case:
I don't think there's anything wrong with what you do, except that it's not too standard-esque regarding iterator usage.

When they say "don't pass by reference" maybe that's because it's more normal/idiomatic to pass iterators as value parameters, instead of passing them by const reference: which you did, for the second parameter.
In this example however you need to return two values: the parsed int value, and, the new/modified iterator value; and given that a function can't have two return codes, coding one of the return codes as a non-const reference is IMO normal.
An alternative would be to code it something like this:
//Comment: the return code is a pair of values, i.e. the parsed int and etc ...
pair<int, input_iterator> parse(input_iterator start, input_iterator end)
{
}

In my opinion, if you want to do this the argument should be a pointer to the iterator you'll be changing. I'm not a big fan of non-const reference arguments because they hide the fact that the passed parameter might change. I know there's a lot of C++ users who disagree with my opinion on this - and that's fine.
However, in this case it's so common for iterators to be treated as value arguments that I think it's a particularly bad idea to pass iterators by non-const reference and modify the passed iterator. It just goes against the idiomatic way iterators are usually used.
Since there is a great way to do what you want that doesn't have this problem, I think you should use it:
template <typename input_iterator>
int parse_integer(input_iterator* begin, input_iterator end);
Now a caller would have to do:
int i = parse_integer(&p, end);
And it'll be obvious that the iterator can be changed.
By the way, I also like litb's suggestion of returning the new iterator and putting the parsed values into a location specified by an output iterator.

In this context, I think that passing an iterator by reference is perfectly sensible, as long as it's well-documented.
It's worth noting that your approach (passing an iterator by reference to keep track of where you are when tokenizing a stream) is exactly the approach that is taken by boost::tokenizer. In particular, see the definition of the TokenizerFunction Concept. Overall, I find boost::tokenizer to be pretty well designed and well thought out.

I think the Standard Library algorithms pass iterators by value exclusively (someone will now post an obvious exception to this) - this may be the origin of the idea. Of course, nothing says that your own code has to look like the Standard Library!

Your function declaration's second parameter is missing the reference, is it?
Anyway, back to your question: No, I haven't ever read anything that says you should not pass iterators by reference. The problem with references is that they allow you to change the referenced object. In this case, if you are to change the iterator, you are potentially screwing up the entire sequence beyond that point thereby rendering further processing impossible.
Just one suggestion: type your parameters carefully.

Related

How to use `boost::range` iterators with standard iterators

I have functions that take in std::vector iterators, as in
typedef std::vector<Point> Points;
Points ConvexHull(Points::const_iterator first, Points::const_iterator last);
I usually pass the std iterators to them, but occasionally I need to work with boost iterators, such as boost::join's range iterator. How should I change the parametrizations of my functions, ideally without templates, so that they accept both iterators? Moreover, how do I indicate in each type which iterator concepts I need?
I tried looking at the boost::range documentation but it's overwhelmingly confusing for me and I don't know where to start.
For example, I couldn't find the difference between boost::range_details::any_forward_iterator_interface and boost::range_details::any_forward_iterator_wrapper, and whether I should use either of those to specify that I need a forward iterator.
Edit:
If I use boost::any_range, how can I pass non-const lvalue references?
For example:
template<typename T>
using Range = boost::any_range<T, boost::random_access_traversal_tag,
T, std::ptrdiff_t>;
f(Range<Point> &points); // defined elsewhere
// -------------
vector<Point> vec;
f(vec); // error; cannot bind non-const lvalue reference to unrelated type

boost-range has the any_range for this purpose and it suits both purposes for your case.
https://www.boost.org/doc/libs/1_60_0/libs/range/doc/html/range/reference/ranges/any_range.html
From your example it would look like this:
#include <boost/range/any_range.hpp>
typedef boost::any_range<Point,
boost::bidirectional_traversal_tag,
Point,
std::ptrdiff_t
> PointRange;

You should strongly consider using a template. Doing so let's the compiler keep useful information about what operations are actually occurring, which greatly helps it generate optimised output. The std:: convention is to name the type parameter for the concept required. E.g.
template< class BidirIt, class UnaryPredicate > // anything bidirectional (which includes random access)
BidirIt std::partition( BidirIt first, BidirIt last, UnaryPredicate p );
If you really don't want a template, you still shouldn't name anything in a detail namespace. Something like
#include <boost/range/any_range.hpp>
using PointRange = boost::any_range<Point, boost::random_access_traversal_tag>; // or another traversal tag.
using PointIterator = PointRange::iterator;
You will likely need to pass PointRange & less frequently than, say, int *&. Almost always passing by value is the correct behaviour. It is cheap to copy, as it holds a begin and end iterator from the Range that it was constructed from, nothing more.

Should conversions from non-const iterator to const iterator be avoided?

I was attempting to compare a const iterator to a non-const iterator but was not certain whether it was okay, so I looked it up. I found out it is OK due to an implicit conversion of non-const iterator to const iterator. However, I was wondering whether you should prefer to not compare these iterators, in order to avoid this conversion. For example,
begin_iterator = buf.begin(); end_iterator = buf.cend();
for (; begin_iterator != end_iterator; ++begin_iterator) { ... }
Regularly, I would consider this OK as const means read-only, which is fine. However, I am uncertain about the (unnecessary) conversion.

It's always fine to convert a non const iterator to a const iterator, just as it's fine to convert a non const pointer to a const pointer.
The opposite is not something you want to do, it can lead to crash (because the data is stored in const only space, for instance). So never do it.

While this might work in this case, it's not the standard pattern for creating ranges. In particular, the standard algorithms expect two iterators of the same type, so, for example, std::fill(whatever.begin(), whatever.cend(), 3) would not compile.
As a result, the code in the question creates a maintenance problem. Suppose a maintainer realizes that the for loop is doing something that can be done with a standard algorithm. The obvious transformation is to replace the for loop with the algorithm. So
begin_iterator = buf.begin(); end_iterator = buf.cend();
for (; begin_iterator != end_iterator; ++begin_iterator) { ... }
becomes
begin_iterator = buf.begin(); end_iterator = buf.cend();
std::whatever(begin_iterator, end_iterator);
but that doesn't compile, so the maintainer has to hunt around and discover that the two iterators are not the same type, and then figure out whether it's okay to change the type of the end iterator to match the begin iterator. That means examining all of the places where the end iterator is used, to determine why it's a different type and whether that matters.
So the real issue is, what problem does this solve, and is the cost worth it? In general, the answer to the first is "nothing important" and the answer to the second is "no".

C++ STL - Why treat a function that returns an iterator as a void function?

The STL has many functions that return iterators. For example, the STL list function erase returns an iterator (both C++98 and C++11). Nevertheless, I see it being used as a void function. Even the cplusplus.com site has an example that contains the following code:
mylist.erase(it1, it2);
which does not return an iterator. Shouldn't the proper syntax be as follows?
std::list<int>::iterator iterList = mylist.erase(it1, it2)?

You are not forced to use a return value in C++. Normally the returned iterator should be useful to have a new valid iterator to that container. But syntactically it's correct. Just keep in mind that after the erasure, it1 and it2 will not be valid any more.

which does not return an iterator. Shouldn't the proper syntax be as
follows?
It does return an iterator, but just as with any other function you can ignore the returned value. If you just want to erase an element and dont care about the returned iterator, then you may simply ignore it.
Actually ignoring the return value is more common than you might expect. Maybe the most common place where return values are ignored is when the only purpose of the return value is to enable chaining. For example the assignment operator is usually written as
Foo& operator=(const Foo& other){
/*...*/
return *this;
}
so that you can write something like a = b = c. However, when you only write b = c you usually ignore the value returned by that call. Also note, that when you write b = c this does return a value, but if you just want to make this assignment, then you have no other choice than to ignore the return value.

The actual reason is far more banal than you might think. If you couldn't ignore a return value, many of the functions like .erase() would need to be split in two versions: an .erase() version returning void and another .erase_and_return() version which returned the iterator. What would you gain from this?

Return values are sometimes that are sometimes useful. If they are not useful, you don't have to use them.
All container.erase functions return the iterator after the newly erased range. For non-node-based containers this is often (but not always) useful because the iterators at and after the range are no longer valid.
For node-based containers this is usually useless, as the 2nd iterator passed in remains valid even after the erase operation.
Regardless, both return that iterator. This permits code that works on a generic container to not have to know if the container maintains valid iterators after the erase or not; it can just store the return value and have a valid iterator to after-the-erase.
Iterators in C++ standard containers are all extremely cheap to create and destroy and copy; in fact, they are in practice so cheap that compilers can eliminate them entirely if they aren't used. So returning an iterator that isn't used can have zero run time cost.
A program that doesn't use this return value here can be a correct program, both semantically and syntactically. At the same time, other semantically and syntactically correct programs will require that you use that return value.
Finally,
mylist.erase(it1, it2);
this does return an iterator. The iterator is immediately discarded (the return value only exists as an unnamed temporary), and compilers are likely to optimize it out of existence.
But just because you don't store a return value, doesn't mean it isn't returned.

Why does operator ++ return a non-const value?

I have read Effective C++ 3rd Edition written by Scott Meyers.
Item 3 of the book, "Use const whenever possible", says if we want to prevent rvalues from being assigned to function's return value accidentally, the return type should be const.
For example, the increment function for iterator:
const iterator iterator::operator++(int) {
...
}
Then, some accidents is prevented.
iterator it;
// error in the following, same as primitive pointer
// I wanted to compare iterators
if (it++ = iterator()) {
...
}
However, iterators such as std::vector::iterator in GCC don't return const values.
vector<int> v;
v.begin()++ = v.begin(); // pass compiler check
Are there some reasons for this?

I'm pretty sure that this is because it would play havoc with rvalue references and any sort of decltype. Even though these features were not in C++03, they have been known to be coming.
More importantly, I don't believe that any Standard function returns const rvalues, it's probably something that wasn't considered until after the Standard was published. In addition, const rvalues are generally not considered to be the Right Thing To Do™. Not all uses of non-const member functions are invalid, and returning const rvalues is blanketly preventing them.
For example,
auto it = ++vec.begin();
is perfectly valid, and indeed, valid semantics, if not exactly desirable. Consider my class that offers method chains.
class ILikeMethodChains {
public:
int i;
ILikeMethodChains& SetSomeInt(int param) {
i = param;
return *this;
}
};
ILikeMethodChains func() { ... }
ILikeMethodChains var = func().SetSomeInt(1);
Should that be disallowed just because maybe, sometimes, we might call a function that doesn't make sense? No, of course not. Or how about "swaptimization"?
std::string func() { return "Hello World!"; }
std::string s;
func().swap(s);
This would be illegal if func() produced a const expression - but it's perfectly valid and indeed, assuming that std::string's implementation does not allocate any memory in the default constructor, both fast and legible/readable.
What you should realize is that the C++03 rvalue/lvalue rules frankly just don't make sense. They are, effectively, only part-baked, and the minimum required to disallow some blatant wrongs whilst allowing some possible rights. The C++0x rvalue rules are much saner and much more complete.

If it is non-const, I expect *(++it) to give me mutable access to the thing it represents.
However, dereferencing a const iterator yields only non-mutable access to the thing it represents. [edit: no, this is wrong too. I really give up now!]
This is the only reason I can think of.
As you rightly point out, the following is ill-formed because ++ on a primitive yields an rvalue (which can't be on the LHS):
int* p = 0;
(p++)++;
So there does seem to be something of an inconsistency in the language here.

EDIT: This is not really answering the question as pointed in the comments. I'll just leave the post here in the case it's useful anyhow...
I think this is pretty much a matter of syntax unification towards a better usable interface. When providing such member functions without differentiating the name and letting only the overload resolution mechanism determine the correct version you prevent (or at least try to) the programmer from making const related worries.
I know this might seem contradictory, in particular given your example. But if you think on most of the use cases it makes sense. Take an STL algorithm like std::equal. No matter whether your container is constant or not, you can always code something like bool e = std::equal(c.begin(), c.end(), c2.begin()) without having to think on the right version of begin and end.
This is the general approach in the STL. Remember of operator[]... Having in the mind that the containers are to be used with the algorithms, this is plausible. Although it's also noticeable that in some cases you might still need to define an iterator with a matching version (iterator or const_iterator).
Well, this is just what comes up to my mind right now. I'm not sure how convincing it is...
Side note: The correct way to use constant iterators is through the const_iterator typedef.

How do I escape the const_iterator trap when passing a const container reference as a parameter

I generally prefer constness, but recently came across a conundrum with const iterators that shakes my const attitude annoys me about them:
MyList::const_iterator find( const MyList & list, int identifier )
{
// do some stuff to find identifier
return retConstItor; // has to be const_iterator because list is const
}
The idea that I'm trying to express here, of course, is that the passed in list cannot/willnot be changed, but once I make the list reference const I then have to use 'const_iterator's which then prevent me from doing anything with modifing the result (which makes sense).
Is the solution, then, to give up on making the passed in container reference const, or am I missing another possibility?
This has always been my secret reservation about const: That even if you use it correctly, it can create issues that it shouldn't where there is no good/clean solution, though I recognize that this is more specifically an issue between const and the iterator concept.
Edit: I am very aware of why you cannot and should not return a non-const iterator for a const container. My issue is that while I want a compile-time check for my container which is passed in by reference, I still want to find a way to pass back the position of something, and use it to modify the non-const version of the list. As mentioned in one of the answers it's possible to extract this concept of position via "advance", but messy/inefficient.

If I understand what you're saying correctly, you're trying to use const to indicate to the caller that your function will not modify the collection, but you want the caller (who may have a non-const reference to the collection) to be able to modify the collection using the iterator you return. If so, I don't think there's a clean solution for that, unless the container provides a mechanism for turning a const interator into a non-const one (I'm unaware of a container that does this). Your best bet is probably to have your function take a non-const reference. You may also be able to have 2 overloads of your function, one const and one non-const, so that in the case of a caller who has only a const reference, they will still be able to use your function.

It's not a trap; it's a feature. (:-)
In general, you can't return a non-const "handle" to your data from a const method. For example, the following code is illegal.
class Foo
{
public:
int& getX() const {return x;}
private:
int x;
};
If it were legal, then you could do something like this....
int main()
{
const Foo f;
f.getX() = 3; // changed value of const object
}
The designers of STL followed this convention with const-iterators.
In your case, what the const would buy you is the ability to call it on const collections. In which case, you wouldn't want the iterator returned to be modifiable. But you do want to allow it to be modifiable if the collection is non-const. So, you may want two interfaces:
MyList::const_iterator find( const MyList & list, int identifier )
{
// do some stuff to find identifier
return retConstItor; // has to be const_iterator because list is const
}
MyList::iterator find( MyList & list, int identifier )
{
// do some stuff to find identifier
return retItor;
}
Or, you can do it all with one template function
template<typename T>
T find(T start, T end, int identifier);
Then it will return a non-const iterator if the input iterators are non-const, and a const_iterator if they are const.

What I've done with wrapping standard algorithms, is have a metaobject for determining the type of container:
namespace detail
{
template <class Range>
struct Iterator
{
typedef typename Range::iterator type;
};
template <class Range>
struct Iterator<const Range>
{
typedef typename Range::const_iterator type;
};
}
This allows to provide a single implementation, e.g of find:
template <class Range, class Type>
typename detail::Iterator<Range>::type find(Range& range, const Type& value)
{
return std::find(range.begin(), range.end(), value);
}
However, this doesn't allow calling this with temporaries (I suppose I can live with it).
In any case, to return a modifiable reference to the container, apparently you can't make any guarantees what your function does or doesn't do with the container. So this noble principle indeed breaks down: don't get dogmatic about it.
I suppose const correctness is more of a service for the caller of your functions, rather that some baby-sitting measure that is supposed to make sure you get your simple find function right.
Another question is: how would you feel if I defined a following predicate and then abused the standard find_if algorithm to increment all the values up to the first value >= 3:
bool inc_if_less_than_3(int& a)
{
return a++ < 3;
}
(GCC doesn't stop me, but I couldn't tell if there's some undefined behaviour involved pedantically speaking.)
1) The container belongs to the user. Since allowing modification through the predicate in no way harms the algorithm, it should be up to the caller to decide how they use it.
2) This is hideous!!! Better implement find_if like this, to avoid this nightmare (best thing to do, since, apparently, you can't choose whether the iterator is const or not):
template <class Iter, class Pred>
Iter my_find_if(Iter first, Iter last, Pred fun)
{
while (first != last
&& !fun( const_cast<const typename std::iterator_traits<Iter>::value_type &>(*first)))
++first;
return first;
}

Although I think your design is a little confusing (as others have pointed iterators allow changes in the container, so I don't see your function really as const), there's a way to get an iterator out of a const_iterator. The efficiency depends on the kind of iterators.
#include <list>
int main()
{
typedef std::list<int> IntList;
typedef IntList::iterator Iter;
typedef IntList::const_iterator ConstIter;
IntList theList;
ConstIter cit = function_returning_const_iter(theList);
//Make non-const iter point to the same as the const iter.
Iter it(theList.begin());
std::advance(it, std::distance<ConstIter>(it, cit));
return 0;
}

Rather than trying to guarantee that the list won't be changed using the const keyword, it is better in this case to guarantee it using a postcondition. In other words, tell the user via comments that the list won't be changed.
Even better would be using a template that could be instantiated for iterators or const_iterators:
template <typename II> // II models InputIterator
II find(II first, int identifier) {
// do stuff
return iterator;
}
Of course, if you're going to go to that much trouble, you might as well expose the iterators of MyList to the user and use std::find.

If you're changing the data directed by the iterator, you're changing the list.
The idea that I'm trying to express here, of course, is that the passed in list cannot/willnot be changed, but once I make the list reference const I then have to use 'cons_iterator's which then prevent me from doing anything with the result.
What is "dong anything"? Modifying the data? That's changing the list, which is contradictory to your original intentions. If a list is const, it (and "it" includes its data) is constant.
If your function were to return a non-const iterator, it would create a way of modifying the list, hence the list wouldn't be const.

You are thinking about your design in the wrong way. Don't use const arguments to indicate what the function does - use them to describe the argument. In this case, it doesn't matter that find() doesn't change the list. What matters is that find() is intended to be used with modifiable lists.
If you want your find() function to return a non-const iterator, then it enables the caller to modify the container. It would be wrong for it to accept a const container, because that would provide the caller with a hidden way of removing the const-ness of the container.
Consider:
// Your function
MyList::iterator find(const MyList& list, int identifier);
// Caller has a const list.
const MyList list = ...
// but your function lets them modify it.
*( find(list,3) ) = 5;
So, your function should take a non-const argument:
MyList::iterator find(MyList& list, int identifier);
Now, when the caller tries to use your function to modify her const list, she'll get a compilation error. That's a much better outcome.

If you're going to return a non-const accessor to the container, make the function non-const as well. You're admitting the possibility of the container being changed by a side effect.
This is a good reason the standard algorithms take iterators rather than containers, so they can avoid this problem.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

What's wrong with passing C++ iterator by reference? - c++

I think the Standard Library algorithms pass iterators by value exclusively (someone will now post an obvious exception to this) - this may be the origin of the idea. Of course, nothing says that your own code has to look like the Standard Library!

Related

How to use `boost::range` iterators with standard iterators

Should conversions from non-const iterator to const iterator be avoided?

C++ STL - Why treat a function that returns an iterator as a void function?

Why does operator ++ return a non-const value?

How do I escape the const_iterator trap when passing a const container reference as a parameter

Categories

Resources