Standard-conforming transform_iterator

Standard-conforming transform_iterator - c++

My goal would be to have an iterator that iterates over elements of type T, but in my case T is not really usable for my end-users. Instead, end-users should use a wrapper W that has a much more useful interface.
In order to construct a W, we need a T plus a reference or pointer to an additional data structure.
The problem is that I will never store elements as W. Instead, elements are always stored as T and only wrapped on-demand. Therefore, my users have to iterator over a data structure holding Ts.
My idea was to write a custom iterator for these data structures that itself iterates over the stored Ts, but upon dereferencing will return W instead. I started looking into how this can be implemented and found various information on this topic, including how to deal with the iterator's reference typedef not actually being a reference. This includes the arrow_proxy trick for implementing operator-> in such cases.
However, I have also attempted to read in the standard to see what it has to say about such iterators. My resource here is this and there it clearly states that as soon as we are dealing with forward iterators, reference is expected to be a (const) reference to the value_type. This is supported by this question.
This makes me wonder whether it is even possible to reasonably implement such a transform_iterator that remains standard conforming if one intends to be using it as a forward_iterator or above?
One way that I could come up with would be to declare the value_type of my iterator to W and then keep a member variable of type W around such that operator* could be implemented like this:
class transform_iterator {
value_type = W;
reference = W &;
// ...
reference operator*() const {
m_wrapper = W(< obtain current T >, m_struct);
return m_wrapper;
}
mutable W m_wrapper;
SeparateDataStructure m_truct;
};
However, this approach seems rather hacky to me. On top of that would this increase the iterators size seemingly considerably, which might or might not be an issue (in the long run).
Note 1: I know that Boost.iterator provides a transform_iterator, but I can't quite follow through the implementation of what iterator category they actually apply to these types of iterators. However, it does seem like they base the category on the result type of the supplied function (in some way), which would suggest that at least in their implementation it is possible for the category to be different from input_iterator_tag (though maybe the only other option is output_iterator_tag?).
Note 2: The question linked above also suggest the same workaround that I sketched here. Does that indicate that there is no better way?
TL;DR: Is there a better way to achieve e.g. a forward iterator that transforms the iterated type on dereference to a different type than to store a member of that type in the iterator itself and update that member on every dereference?

First, choose the most restrictive iterator category you can get away with:
if you don't need to allow multiple passes, use InputIterator and just return a temporary W by value
This still has all the usual iterator methods (operator*, operator->, both operator++ etc.)
if you do need multiple passes, you need a ForwardIterator with its additional requirement to return an actual reference.
As you say, this can only be done by storing a W somewhere (in the iterator or off to the side).
The big problem is with mutable forward iterators: mutating *i must also affect *j if i == j, and that means your W can't be a standalone value type, but must be some kind of write-through proxy. That's not impossible, but you can't simply take some existing type and use it like this.
If you have C++20 access, you could probably save some effort by just using a transform_view - although this may be outweighed by the work of changing everything else to use ranges rather than raw iterators.

Related

What could be a "least bad implementation" for an iterator over a proxied container?

Context
I was trying to implement a nD array like container. Something that would wrap an underlying sequence container and allow to process it as a container of containers (of...): arr[i][j][k] should be a (eventually const) reference for _arr[(((i * dim2) + j) * dim3) + k].
Ok until there, arr[i] has just to be a wrapper class over the subarray...
And when I tried to implement interators, I suddenly realized that dragons were everywhere around:
my container is not a standard compliant container because operator [] returns a proxy or wrapper instead of a true reference (When Is a Container Not a Container?)
this causes the iterator to be either a stashing iterator (which is known to be bad (Reference invalidation after applying reverse_iterator on a custom made iterator and its accepted answer)
... or a proxy iterator which is not necessarily better (To Be or Not to Be (an Iterator))
The real problem is that as soon as you have a proxied container, no iterator can respect the following requirement for a forward iterator:
Forward iterators [forward.iterators]
...6 If a and b are both dereferenceable, then a == b if and only if *a and *b are bound to the same object.
Examples come from the standard library itself:
vector<bool> is known not to respect all the requirements for containers because it returns proxies instead of references:
Class vector [vector.bool]...3 There is no requirement that the data be stored as a contiguous allocation of bool values. A space-optimized
representation of bits is recommended instead.
4 reference is a class that simulates the behavior of references of a single bit in vector.
filesystem path iterator is known to be a stashing iterator:
path iterators [fs.path.itr]...
2 A path::iterator is a constant iterator satisfying all the requirements of a bidirectional iterator (27.2.6)
except that, for dereferenceable iterators a and b of type path::iterator with a == b, there is no requirement
that*a and *b are bound to the same object.
and from cppreference:
Notes: std::reverse_iterator does not work with iterators that return a reference to a member object (so-called "stashing iterators"). An example of stashing iterator is std::filesystem::path::iterator.
Question
I have currently found plenty of references about why proxied containers are not true containers and why it would be nice if proxied containers and iterators were allowed by the standard. But I have still not understood what was the best that could be done and what were the real limitations.
So my question is why proxy iterators are really better that stashing ones, and what algorithms are allowed for either of them. If possible, I would really love to find a reference implementation for such an iterator
For references, a current implementation of my code has been submitted on Code Review. It contains a stashing iterator (that broke immediately when I try to use std::reverse_iterator)

OK, we have two similar but distinct concepts. So lets lay them out.
But first, I need to make a distinction between the named requirements of C++-pre-20, and the actual in-language concepts created for the Ranges TS and included in C++20. They're both called "concepts", but they're defined differently. As such, when I talk about concept-with-a-lowercase-c, I mean the pre-C++20 requirements. When I talk about Concept-with-a-captial-C, I mean the C++20 stuff.
Proxy Iterators
Proxy iterators are iterators where their reference is not a value_type&, but is instead some other type that behaves like a reference to value_type. In this case, *it returns a prvalue to this reference.
The InputIterator concept imposes no requirement on reference, other than that it is convertible to value_type. However, the ForwardIterator concept makes the explicit statement that "reference is a reference to T".
Therefore, a proxy iterator cannot fit the ForwardIterator concept. But it can still be an InputIterator. So you can safely pass a proxy iterator to any function that only requires InputIterators.
So, the problem with vector<bool>s iterators is not that they're proxy iterators. It's that they promise they fulfill the RandomAccessIterator concept (though the use of the appropriate tag), when they're really only InputIterators and OutputIterators.
The Ranges proposal (mostly) adopted into C++20 makes changes to the iterator Concepts which allow all iterators to be proxy iterators. So under Ranges, vector<bool>::iterator really fulfills the RandomAccessIterator Concept. Therefore, if you have code written against the Ranges concepts, then you can use proxy iterators of all kinds.
This is very useful for dealing with things like counting ranges. You can have reference and value_type be the same type, so you're just dealing with integers either way.
And of course, if you have control over the code consuming the iterator, you can make it do whatever you want, so long as you don't violate the concept your iterator is written against.
Stashing Iterators
Stashing iterators are iterators where reference is (directly or indirectly) a reference to an object stored in the iterator. Therefore, if you make a copy of an iterator, the copy will return a reference to a different object than the original, even though they refer to the same element. And when you increment the iterator, previous references are no longer valid.
Stashing iterators are usually implemented because computing the value you want to return is expensive. Maybe it would involve a memory allocation (such as path::iterator) or maybe it would involve a possibly-complex operation that should only be done once (such as regex_iterator). So you only want to do it when necessary.
One of the foundations of ForwardIterator as a concept (or Concept) is that a range of these iterators represents a range over values which exist independently of their iterators. This permits multipass operation, but it also makes doing other things useful. You can store references to items in the range, and then iterate elsewhere.
If you need an iterator to be a ForwardIterator or higher, you should never make it a stashing iterator. Of course, the C++ standard library is not always consistent with itself. But it usually calls out its inconsistencies.
path::iterator is a stashing iterator. The standard says that it is a BidirectionalIterator; however, it also gives this type an exception to the reference/pointer preservation rule. This means that you cannot pass path::iterator to any code that might rely on that preservation rule.
Now, this doesn't mean you can't pass it to anything. Any algorithm which requires only InputIterator will be able to take such an iterator, since such code cannot rely on that rule. And of course, any code which you write or which specifically states in its documentation that it doesn't rely on that rule can be used. But there's no guarantee that you can use reverse_iterator on it, even though it says that it is a BidirectionalIterator.
regex_iterators are even worse in this regard. They are said to be a ForwardIterators based on their tag, but the standard never says that they actually are ForwardIterators (unlike path::iterator). And the specification of them as having reference be an actual reference to a member object makes it impossible for them to be true ForwardIterators.
Note that I made no distinction between the pre-C++20 concept and the Ranges Concept. That's because the std::forward_iterator Concept still forbids stashing iterators. This is by design.
Usage
Now obviously, you can do whatever you want in your code. But code you don't control will be under the domain of its owners. They will be writing against the old concepts, the new Concepts, or some other c/Concept or requirement that they specify. So your iterators need to be able to be compatible with their needs.
The algorithms that the Ranges introduces uses the new Concepts, so you can always rely on them to work with proxy iterators. However, as I understand it, the Range Concepts are not back-ported into older algorithms.
Personally, I would suggest avoiding stashing iterator implementations entirely. By providing complete support for proxy iterators, most stashing iterators can be rewritten to return values rather than references to objects.
For example, if there were a path_view type, path::iterator could have returned that instead of a full-fledged path. That way, if you want to do the expensive copy operation, you can. Similarly, the regex_iterators could have returned copies of the match object. The new Concepts make it possible to work that way by supporting proxy iterators.
Now, stashing iterators handle caching in a useful way; iterators can cache their results so that repeated *it usage only does the expensive operation once. But remember the problem with stashing iterators: returning a reference to their contents. You don't need to do that just to get caching. You can cache the results in an optional<T> (which you invalidate when the iterator is in/decremented). So you can still return a value. It may involve an additional copy, but reference shouldn't be a complex type.
Of course, all of this means that auto &val = *it; isn't legal code anymore. However, auto &&val = *it; will always work. This is actually a big part of the Range TS version of iterators.

How to find if an r-value reference is a result of an explicit application of std::move?

I'm trying to implement a container which (for some reason) does not allow to implement a simple "reference type". It is something similar to vector<bool>, where the reference is actually a small object representing a "reference" to a particular element of the container.
Now I would like to use "modifying" STL algorithms like std::sort for this container and make it as efficient as possible. Everything works quite fine as soon as the algorithm is based simply on swaps and comparisons of the reference objects since both can be done fast even for heavy objects stored in the container.
The problem comes when the algorithm is trying to make a copy to a temporary value like, e.g., in some implementations of the insertion sort where at some point the code contains the statement
value_type val = * iterator;
where * iterator (the custom operator*()) returns a light-weight reference object by value. I could implement a move constructor (as well as the assignment) from reference r-value to the value type and implement it efficiently, but this would effectively "steal" the content of the value the iterator "points" to which is not at all the intention of the above statement. What would be nice if there was a way how to distinguish between this kind of statement and
value_type val = std::move(* iterator)
where I could steal the content of the "reference" without remorse.
I thought about adding a "flag" to the reference object to denote whether or not it is safe to make a move or make a copy instead from the given reference r-value and allow the move as soon as the std::move is applied explicitly. I understand that std::move is simply a static cast so that implementing a "custom" move is not a real option here (like, e.g., having a custom swap).
Any ideas or pointers would be helpful.

You may add an overload move for wrapperType.
namespace std
{
SomeType move(my_namespace::wrapperType&) {/* Your implementation */}
}
Care to include it each time std::move can be called, else, program is ill-formed without required diagnostic.
std::forward may have similar treatment.
Note: Declare/define function/class in std is most of the time UB.

Can I create an empty range (iterator pair) without an underlying container object?

I have a class akin to the following:
struct Config
{
using BindingContainer = std::map<ID, std::vector<Binding>>;
using BindingIterator = BindingContainer::mapped_type::const_iterator;
boost::iterator_range<BindingIterator> bindings(ID id) const;
private:
BindingContainer m_bindings;
};
Since the ID passed to bindings() might not exist, I need to be able to represent a 'no bindings' value in the return type domain.
I don't need to differentiate an unknown ID from an ID mapped to an empty vector, so I was hoping to be able to achieve this with the interface as above and return an empty range with default-constructed iterators. Unfortunately, although a ForwardIterator is DefaultConstructible [C++11 24.2.5/1] the result of comparing a singular iterator is undefined [24.2.1/5], so without a container it seems this is not possible.
I could change the interface to e.g wrap the iterator_range in a boost::optional, or return a vector value instead; the former is a little more clunky for the caller though, and the latter has undesirable copy overheads.
Another option is to keep a statically-allocated empty vector and return its iterators. The overhead wouldn't be problematic in this instance, but I'd like to avoid it if I can.
Adapting the map iterator to yield comparable default-constructed iterators is a possibility, though seems over-complex...
Are there any other options here that would support returning an empty range when there is no underlying container?
(Incidentally I'm sure a while back I read a working paper or article about producing empty ranges for standard container type when there is no container object, but can't find anything now.)
(Note I am limited to C++11 features, though I'd be interested if there is any different approach requiring later features.)

Nope, there aren't. Your options are as you suggest. Personally, I would probably go with the idea of hijacking an iterator pair from a static empty vector; I can't imagine what notional "overhead" would be involved here, beyond a couple of extra bytes in your process image.
Is this a singular iterator and, if so, can I compare it to another one?
Comparing default-constructed iterators with operator==
And this hasn't changed in either C++14 or C++17 (so far).

You may use a default constructed boost::iterator_range
from (https://www.boost.org/doc/libs/1_55_0/libs/range/doc/html/range/reference/utilities/iterator_range.html):
However, if one creates a default constructed iterator_range, then one
can still call all its member functions. This design decision avoids
the iterator_range imposing limitations upon ranges of iterators that
are not singular.
Example here:
https://wandbox.org/permlink/zslaPwmk3lBI4Q9N

Define Custom (STL-compatible) Iterator where Value Type is not CopyConstuctible or Assignable

I wish to define a custom iterator which uses a value type that can not be copied.
The rationale for this is that the iterator will be responsible for module enumeration under Windows, and I'm using the CreateToolhelp32Snapshot/Module32First/Module32Next APIs to avoid having to preprocess the entire module list (e.g. Each iterator increment should advance to the next module in the list on demand to avoid unnecessary overhead.). The problem with this is that these APIs require the use of a 'HANDLE' managed by the Toolhelp APIs, so I have no control over the 'position' in the list other than by calls to First/Next.
I could technically allow copying of the handle, but then you run into problems like this:
auto Iter1 = ModList.begin(); // Assume 'ModList' is an instance of my class which manages construction etc of my iterator
auto Iter2 = Iter1; // Both iterators now point to the first module in the list
++Iter1; // BOTH iterators now point to the second module in this list!! We just 'broke' the expected behavior of 'Iter2'.
Is it possible to define an STL-compatible iterator (that will work with the standard algorithms etc) that plays nice with a value type that cannot be copied or assigned?
I could also wrap the value type in a shared pointer in the implementation of the iterator, and that would make the iterator itself copyable and assignable, but it still doesn't solve the problem outlined in the code above.
Note: I could make the value type moveable if that helps.
I already have a heavy Boost dependency in my code base so feel free to suggest solutions which use Boost.
Sorry for the poorly written question, I've been up for quite a while and my brain doesn't want to work properly anymore. :P Let me know if clarification is needed.

If you tag your iterator as an InputIterator rather than a ForwardIterator, then the behavior you describe is not unexpected. Incrementing an InputIterator invalidates copies of it, so it doesn't matter what they appear to point to after that.
You can use it with standard algorithms that accept an InputIterator, which is exactly those that can reasonably be implemented as single-pass rather than multi-pass algorithms.
Depending whether your algorithm implementations check iterator tags, though, you may not get any help ensuring that you don't use it incorrectly. There's no difference in the "signature" of the InputIterator and ForwardIterator interfaces, only in the semantics, so compile-time duck typing alone doesn't help.

An iterator should logically refer to an element - it shouldn't be an element. You'll need a backend collection of modules, and iterators that refer to individual elements in this collection - just like the stdlib iterators. Incrementing an iterator should make it change which actual element it refers to, without modifying the backend collection.
For your particular problem, you could make a collection which continues the toolhelp iteration on demand, whenever someone refers to a module you haven't yet iterated to, you fill in more data in the collection.

Should I prefer iterators over const_iterators?

Someone here recently brought up the article from Scott Meyers that says:
Prefer iterators over const_iterators (pdf link).
Someone else was commenting that the article is probably outdated. I'm wondering what your opinions are?
Here is mine: One of the main points of the article is that you cannot erase or insert on a const_iterator, but I think it's funny to use that as an argument against const_iterators. I thought the whole point of const_iterators it that you do not modify the range at all, neither the elements themselves by substituting their values nor the range by inserting or erasing. Or am I missing something?

I totally agree with you.
I think the answer is simple:
Use const_iterators where const values are the right thing to use, and vice versa.
Seems to me that those who are against const_iterators must be against const in general...

Here's a slightly different way to look at it. Const_iterator almost never makes sense when you are passing it as a pointer into a specific collection and you are passing the collection as well. Mr. Meyer was specifically stating that const_iterator cannot be used with most member functions of a collection instance. In that case, you will need a plain-old iterator. However, if you don't have a handle to the collection, the only difference between the two is that you can modify what is pointed to by an iterator and you can't modify the object referenced by a const_iterator.
So... you want to use iterator whenever you are passing a collection and position into the collection to an algorithm. Basically, signatures like:
void some_operation(std::vector<int>& vec, std::vector::const_iterator pos);
don't make a whole lot of sense. The implicit statement is that some_operation is free to modify the underlying collection but is not allowed to modify what pos references. That doesn't make much sense. If you really want this, then pos should be an offset instead of an iterator.
On the flip side, most of the algorithms in the STL are based on ranges specified by a pair of iterators. The collection itself is never passed so the difference between iterator and const_iterator is whether the value in the collection can be modified through the iterator or not. Without a reference to the collection, the separation is pretty clear.
Hopefully that made things as clear as mud ;)

I don't think this particular statement of Meyer's needs to be taken with special concern. When you want a non-modifying operation, it is best to use a const_iterator. Otherwise, use an ordinary iterator. However, do note the one important thing: Never mix iterators i.e. const ones with non-const ones. As long as you are aware of the latter, you should be fine.

I generally prefer constness, but recently came across a conundrum with const_iterators that has confused my "always use const were possible" philosophy:
MyList::const_iterator find( const MyList & list, int identifier )
{
// do some stuff to find identifier
return retConstItor;
}
Since passing in a const list reference required that I only use const iterators, now if I use the find, I cannot do anything with the result but look at it even though all I wanted to do was express that find would not change the list being passed in.
I wonder perhaps, then, if Scott Mayers advice has to do with issues like this where it becomes impossible to escape const-ness. From what I understand, you cannot (reliably) un-const const_iterators with a simple cast because of some internal details. This also (perhaps in conjunction) be the issue.
this is probably relevant: How to remove constness of const_iterator?

C++98
I think one needs to take into account that Meyers statement refers to c++98. Hard to tell today, but if I remember right
it simply was not easy to get a const_iterator for a non const container at all
if you got a const_iterator you could have hardly made any use of it since most (all?) position arguments for container member functions were expected to be iterators and not const_iterators
e.g.
std::vector<int> container;
would have required
static_cast<std::vector<int>::const_iterator>(container.begin())
to get a const_iterator, which would have considerably inflated a simple .find
and even if you had your result then after
std::vector<int>::const_iterator i = std::find(static_cast<std::vector<int>::const_iterator>(container.begin()), static_cast<std::vector<int>::const_iterator>(container.end()),42);
there would have been no way to use your std::vector::const_iterator for insertion into the vector or any other member function that expected iterators for a position. And there was no way to get a iterator from a const iterator. No way of casting existed (exists?) for that.
Because const iterator does not mean that the container could not be changed, but only that the element pointed to could not been changed (const iterator being the equivalent of pointer to const) that was really a big pile of crap to deal with in such cases.
Today the opposite is true.
const iterators are easy to get using cbegin etc. even for non const containers and all (?) member functions that take positions have const iterators as their arguments so there is no need for any conversion.
std::vector<int> container;
auto i = std::find(container.cbegin(), container.cend(), 42);
container.insert(i, 43);
So what once was
Prefer iterators over const_iterators
today really really should be
Prefer const_iterators over iterators
since the first one is simply an artifact of historical implementation deficits.

By my reading of that link, Meyers appears to be fundamentally saying that interator's are better than const_interator's because you cannot make changes via a const_iterator.
But if that is what he is saying then Meyers is in fact wrong. This is precisely why const_iterator's are better than iterator's when that is what you want to express.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js