Turning the next(), hasNext() iterator interface into begin(), end() interface - c++

I have to use an external library I cannot change. This library among others can tokenize specially formatted files by its internal logic. The tokenizer offers an iterator interface for accessing tokens, which looks like the following simplified example:
class Tokenizer {
public:
/* ... */
Token token() const; // returns the current token
Token next() const; // returns the next token
bool hasNext() const; // returns 'true' if there are more tokens
/* ... */
};
I would like to implement an iterator wrapper for the presented Tokenizer which allows the use of standard algorithms library (std::copy_if, std::count, etc.). To be more specific, suffice if the iterator wrapper meets the requirements of input iterator.
My current trial looks like the following:
class TokenIterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = Token;
using difference_type = std::ptrdiff_t;
using pointer = const value_type*;
using reference = const value_type&;
explicit TokenIterator(Tokenizer& tokenizer) :
tokenizer(tokenizer) {
}
TokenIterator& operator++() {
tokenizer.next();
return *this;
}
value_type operator*() {
return tokenizer.token();
}
private:
Tokenizer& tokenizer;
};
I got stuck with implementation of functions like begin and end, equality comparator, etc. So, my questions are:
How can I construct a TokenIterator instance which indicates the end of the token sequence (i.e. hasNext() == false) and how can I compare it to another TokenIterator instance to decide whether they are same?
Is it a good approach if I return a value from the overload of operator*() instead of a reference?

First, I recommend taking a close look at http://www.boost.org/doc/libs/1_65_1/libs/iterator/doc/iterator_facade.html
I find that it vastly reduces the amount of boilerplate needed for code like this.
Then, you have to decide how you wish to represent an iterator that has reached the "end". One approach is to make a default constructed iterator be the "end" iterator. It contains no object and you must not increment or dereference it.
The "begin" iterator is then a non-default-constructed iterator. It has an object and you can dereference it. Incrementing this iterator simply checks hasNext(). If true, set the contained object to next(). If false, clear the contained object and make this iterator look like like a default constructed one.
There shouln't be any problems returning by value from operator*. Even if you assign to a reference, lifetime extension will keep the value around until the reference goes out of scope. That said, any code that assumes such references remain valid over multiple iterations WILL break, so stick to simple for (auto val : tokens) or for (auto& val : tokens).

By the suggestions of the accepted answer I successfully implemented the iterator wrapper I intended to do.
Here is an example implementation which corresponds to the example shown in the question:
class TokenIterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = Token;
using difference_type = std::ptrdiff_t;
using pointer = const value_type*;
using reference = const value_type&;
TokenIterator() : tokenizer(nullptr), token(value_type()) {
}
TokenIterator(Tokenizer& tokenizerToWrap) : TokenIterator() {
if(tokenizerToWrap.hasNext()) {
tokenizer = &tokenizerToWrap;
token = tokenizerToWrap.token();
}
}
TokenIterator(const TokenIterator& other) :
tokenizer(other.tokenizer), token(other.token) {
}
reference operator*() const {
assertTokenizer();
return token;
}
pointer operator->() const {
return &(operator*());
}
TokenIterator& operator++() {
assertTokenizer();
if(tokenizer->hasNext())
token = tokenizer->next();
else
*this = TokenIterator();
return *this;
}
TokenIterator operator++(int) {
TokenIterator previousState = *this;
operator++();
return previousState;
}
friend bool operator==(const TokenIterator& lhs, const TokenIterator& rhs) {
return lhs.tokenizer == rhs.tokenizer;
}
friend bool operator!=(const TokenIterator& lhs, const TokenIterator& rhs) {
return !(lhs == rhs);
}
private:
void assertTokenizer() const {
if(!tokenizer) throw std::out_of_range("iterator is out of range");
}
Tokenizer* tokenizer;
value_type token;
};
For the compatibility with the range-based for loop, here are the necessary begin() and end() functions:
TokenIterator begin(Tokenizer& tokenizerToWrap) {
return TokenIterator(tokenizerToWrap);
}
TokenIterator end(Tokenizer&) {
return TokenIterator();
}

Related

C++: "Iterable<T>" interface

What I want to achieve is probably easily explained: Consider I have an abstract class that I know will contain multiple objects of known type. However the actual container holding these objects will be implemented in sub-classes.
In my abstract base class I now want to provide an interface to iterate over these objects. Given that I don't know (or rather don't want to fix) the type of container, I thought that iterators would probably be my best bet.
A conceptual declaration of this class might look like this:
class MyClass {
public:
// Other interface methods, e.g. size()
virtual Iterable<MyObject> objects() = 0;
};
The intention here is that I'll be able to iterate over the nested objects of my class like this:
MyClass *class = new ImplementationOfClass();
for (const MyObject &obj : class->objects()) {
// Do stuff with obj
}
The issue I am facing however is that I can't seem to figure out how Iterable<MyObject> should be defined. The key property of this object is that at the time of defining this class I can only specify that the returned value will be iterable (using STL-style iterators) and will yield objects of type MyObject when the used iterator is dereferenced.
Normally I would use an abstract class on its own for this but it seems that this is very tricky (impossible?) since iterators are always passed by value and thus to my knowledge no Polymorphism is possible.
Questions dealing with how to pass arbitrary iterator types as arguments into a function always come up with the "use templates" answer. However I think in my case I can't use templates for that. This assumption might be wrong though, so feel free to correct me.
Essentially the barrier I always run into is that at some point I have to write down the iterator type explicitly which in my case I can't. I thought about using a template for that but this would then inhibit proper Polymorphism (I think?) because the user of that abstract interface seems to have the burden of explicitly initializing the correct template. The whole point of all of this however is that the caller does not have to care about the underlying structure.
TL;DR: Is there a way to create an interface class that only promises to be iterable and that dereferencing an iterator will yield an object of type T?
With the help of #FrançoisAndrieux and a hint from https://stackoverflow.com/a/4247445/3907364, I was able to come up with an approach to my problem.
Essentially the idea is to create an iterator-wrapper that stores a function to obtain an object of the given type if given an index. That index is then what is iterated on.
The nice thing about this is that the iterator interface is fixed by specifying the type of object that dereferencing it should return. The polymorphism comes into play by making the member function objects() virtual so that each sub-class can construct the iterator itself, providing a custom function pointer. Thus as long as there is a way to map an index to the respective element in the container (whichever is used), this trick is usable.
Note that you can either directly use pointers to e.g.std::vector<T>::at or create a custom function that will return the respective element.
Here's the implementation for the iterator (The implementation could probably be improved upon but it seems to get the job done):
template< typename T > struct iterator_impl {
using iterator_category = std::forward_iterator_tag;
using difference_type = std::ptrdiff_t;
using value_type = T;
using pointer = T *;
using reference = T &;
using access_function_t = std::function< T &(std::size_t) >;
// regular Ctor
iterator_impl(std::size_t start, access_function_t &func, const void *id)
: m_index(start), m_func(func), m_id(id) {}
// function-move Ctor
iterator_impl(std::size_t start, access_function_t &&func, const void *id)
: m_index(start), m_func(func), m_id(id) {}
// copy Ctor
iterator_impl(const iterator_impl &) = default;
// move ctor
iterator_impl(iterator_impl &&other) {
std::swap(m_index, other.m_index);
m_func = std::move(other.m_func);
std::swap(m_id, other.m_id);
}
// copy-assignment
iterator_impl &operator=(const iterator_impl &other) = default;
// prefix-increment
iterator_impl &operator++() {
++m_index;
return *this;
}
// postfix-increment
iterator_impl operator++(int) {
iterator_impl old = *this;
++(*this);
return old;
}
bool operator==(const iterator_impl &other) { return m_index == other.m_index && m_id == other.m_id; }
bool operator!=(const iterator_impl &other) { return !(*this == other); }
T &operator*() { return m_func(m_index); }
T *operator->() { return &m_func(m_index); };
protected:
std::size_t m_index = 0;
access_function_t m_func;
const void *m_id = nullptr;
};
Note that I had to introduce the m_id member variable as a means to properly compare iterators (std::function can't be compared using ==). it is meant to be e.g. the address of the container the elements are contained in. Its sole purpose is to make sure that 2 iterators that happen to have the same index but are iterating over completely different sets are not considered equal.
And based on that here's an implementation of an Iterable<T>:
template< typename T > struct Iterable {
using iterator = iterator_impl< T >;
using const_iterator = iterator_impl< const std::remove_const_t< T > >;
Iterable(std::size_t start, std::size_t end, typename iterator_impl< T >::access_function_t &func, const void *id)
: m_begin(start, func, id), m_end(end, func, id) {}
iterator begin() { return m_begin; }
iterator end() { return m_end; }
const_iterator begin() const { return m_begin; }
const_iterator end() const { return m_end; }
const_iterator cbegin() const { return m_begin; }
const_iterator cend() const { return m_end; }
protected:
iterator m_begin;
iterator m_end;
};

Valid way of accessing the address of the one-past-end element of a vector

I wanted to implement an iterator to use a custom class in a for range loop. The iterator access an internal std::vector of std::unique_ptr of a Base class and returns a raw pointer to a child class.
This is what I came up with:
using upBase = std::unique_ptr<Base>;
class Test
{
std::vector<upBase> list;
public:
void Add(upBase&& i) { list.push_back(std::move(i)); }
class iterator
{
upBase* ptr;
public:
iterator(upBase* p) : ptr(p) {}
bool operator!=(const iterator& o) { return ptr != o.ptr; }
iterator& operator++() { ++ptr; return *this; }
Child& operator*() { return *(Child*)(*ptr).get(); }
const Child& operator*() const { return *(Child*)(*ptr).get(); }
};
iterator begin() { return iterator(&list[0]); }
iterator end() { return iterator(&list[list.size()]); }
};
This works fine on the latest compilers (tested on GodBolt with GCC, Clang and MSVC) but when using Visual Studio 2015 the end() method throws a run-time exception:
Debug assertion failed. C++ vector subscript out of range.
I search the internet for a proper way to access the address of the one-past-end element of a std::vector, but didn't find anything except complicated pointer arithmetic.
I finally came up with the following implementation for the begin() and end() methods:
iterator begin() { return iterator(&list.front()); }
iterator end() { return iterator(&list.back() + 1); }
This doesn't complain at run-time. Is it the correct way to access the address of the one-past-end element of an std::array or std::vector?
If not, what would be the proper way?
What would be the proper way?
You are trying to re-invent the wheel. You do not need to implement the class iterator for your Test, as you could get the begin and end iterator from the list (i.e. std::vector<upBase>::begin and std::vector<upBase>::end)
Therefore just make them available via corresponding member functions in Test class:
class Test
{
std::vector<upBase> list;
public:
void Add(upBase&& i) { list.push_back(std::move(i)); }
auto begin() /* const noexcept */ { return list.begin(); }
auto end() /* const noexcept */ { return list.end(); }
};
(See a demo here)
Also note that the auto return is only possible since c++14. If the compiler does not support C++14, you can provide it as trailing return type, as follows (assuming at least you have access to c++11):
auto begin() -> decltype(list.begin()) { return list.begin(); }
auto end() -> decltype(list.end()) { return list.end(); }

How to make a hierarchy of different object "generators" in plain C++11

What I need is the following hierarchy of classes (given here as a sketch):
class DataClass {}
class AbstractGenerator {
public:
// Generates DataClass objects one by one. In a lazy manner
virtual DataClass produce() = 0;
}
class RandGenerator : AbstractGenerator {
public:
RandGenerator(int maximal_int) : maximal(maximal_int) {}
DataClass produce() {
// get a random number from 0 to this->maximal
// make a DataClass object from the random int and return it
}
private:
int maximal;
}
class FromFileGenerator : AbstractGenerator {
public:
FromFileGenerator(string file_name) : f_name(file_name) {}
DataClass produce() {
// read the next line from the file
// deserialize the DataClass object from the line and return it
}
private:
string f_name;
}
What I want to support for both RandGenerator and FromFileGnerator is:
RandGenerator rg();
for (DataClass data : rg) {...}
And also some method of taking "first n elements of the generator".
What are the appropriate tools in the plain C++11 that one could use to achieve this, or whatever is the closest to this in C++11?
boost::function_input_iterator is the normal tool for this job, but since you want "plain" C++, we can just reimplement the same concept.
class generator_iterator {
std::shared_ptr<AbstractGenerator> generator;
public:
using iterator_category = std::input_iterator_tag;
generator_iterator(std::shared_ptr<AbstractGenerator> generator_)
:generator(generator_) {}
DataClass operator*(){return generator->produce();}
generator_iterator& operator++(){return *this;}
generator_iterator operator++(int){return *this;}
//plus all the other normal bits for an output_iterator
};
And then in your AbstractGenerator class, provide begin and end methods.
generator_iterator begin() {return {this};}
generator_iterator end() {return {nullptr};} //or other logic depending on how you want to detect the end of a series
Add a begin and end member function to AbstractGenerator, which return iterators which call the produce member function.
Such an iterator (demo) could look similar to this:
template<typename Fn>
struct CallRepeatedlyIterator
{
using iterator_category = std::input_iterator_tag;
using value_type = typename std::result_of<Fn()>::type;
// Not sure if that's correct (probably not):
using difference_type = unsigned;
using pointer = value_type *;
using reference = value_type &;
bool is_end;
union {
Fn fn;
};
union {
value_type buffer;
};
value_type operator*() const
{
return buffer;
}
CallRepeatedlyIterator & operator++()
{
buffer = fn();
return *this;
}
CallRepeatedlyIterator()
: is_end(true)
{}
explicit CallRepeatedlyIterator(Fn f)
: is_end(false)
{
new (&fn) Fn(f);
new (&buffer) value_type(fn());
}
bool operator==(CallRepeatedlyIterator const & other) const
{
return is_end && other.is_end;
}
bool operator!=(CallRepeatedlyIterator const & other) const
{
return !(*this == other);
}
// NOTE: A destructor is missing here! It needs to destruct fn and buffer
// if its not the end iterator.
};
Now your begin member function returns such an iterator which calls produce (e.g. using a by reference capturing lambda) and end returns an "end" iterator.
This means that your for loop would run forever (no way to reach the end iterator)!

redifining operators for iterator class c

I faced a problem of overloading the ->() operator while implementing the Iterator class. How should this operator be overloaded?
class iterator
{
private:
pair<Key_t, Val_t> p;
public:
iterator()
{
}
iterator(const iterator &i)
{
p = i.p;
}
iterator(Key_t key, Val_t v)
{
p = make_pair(key,v);
}
pair<const Key_t,Val_t>& operator *() const
{
return p;
}
iterator& operator = (const iterator &iter)
{
this->p = iter;
return *this;
}
};
tried this way unsuccessfully
&(pair<const Key_t,Val_t>&) operator ->() const
{
return &(**this);
}
This whole approach looks wrong.
An iterator isn't supposed to contain a value, it's supposed to contain at least
The information necessary to locate a value inside the container.
Information necessary to traverse to the next element within the container.
By storing a value inside the iterator, you cause unnecessary copies and lose the ability to update the container (change the value, remove the element from the container, etc).
For example, an iterator for a std::vector-like container might store a handle to the container and the index (offset) to the current item.
The only time an iterator would have a value itself is when you're implementing a generator that isn't actually associated with a container.

Default Construction of valid Input Iterators

I am designing an input iterator type that enumerates all running processes in a system.
This is similar to an iterator I designed to enumerate modules in a process. The module iterator takes a 'process' object in the constructor, and a default constructed iterator is considered to be the off-the-end iterator.
Example:
hadesmem::ModuleIterator beg(process);
hadesmem::ModuleIterator end;
assert(beg != end);
I do not know what to do about process enumeration though, because there is no 'state' or information that needs to be given to the iterator (everything is handled internally by the iterator using the Windows API).
Example:
// This is obviously a broken design, what is the best way to distinguish between the two?
hadesmem::ProcessIterator beg;
hadesmem::ProcessIterator end;
What is the idiomatic way to deal with this situation? i.e. Where you need to distinguish between the creation of a 'new' iterator and an off-the-end iterator when nothing needs to be given to the iterator constructor.
If it's relevant, I am able to use C++11 in this library, as long as it's supported by VC11, GCC 4.7, and ICC 12.1.
Thanks.
EDIT:
To clarify, I know that it's not possible to distinguish between the two in the form I've posted above, so what I'm asking is more of a 'design' question than anything else... Maybe I'm just overlooking something obvious though (wouldn't be the first time).
What you really want to do is create a kind of ProcessList object, and base the iterators on that. I wouldn't want to be enumerating all processes or something every time I increment an iterator.
If you create a class that holds the parameters that go into the CreateToolhelp32Snapshot() representing the snapshot you're iterating over, you'll have a natural factory for the iterators. Something like this should work (I'm not on Windows, so not tested):
class Process;
class Processes {
DWORD what, who;
public:
Processes(DWORD what, DWORD who) : what(what), who(who) {}
class const_iterator {
HANDLE snapshot;
LPPROCESSENTRY32 it;
explicit const_iterator(HANDLE snapshot, LPPROCESSENTRY32 it)
: snapshot(snapshot), it(it) {}
public:
const_iterator() : snapshot(0), it(0) {}
// the two basic functions, implement iterator requirements with these:
const_iterator &advance() {
assert(snapshot);
if ( it && !Process32Next(snapshot, &it))
it = 0;
return *this;
}
const Process dereference() const {
assert(snapshot); assert(it);
return Process(it);
}
bool equals(const const_iterator & other) const {
return handle == other.handle && it == other.it;
}
};
const_iterator begin() const {
const HANDLE snapshot = CreateToolhelp32Snapshot(what, who);
if (snapshot) {
LPPROCESSENTRY32 it;
if (Process32First(snapshot, &it))
return const_iterator(snapshot, it);
}
return end();
}
const_iterator end() const {
return const_iterator(snapshot, 0);
}
};
inline bool operator==(Processes::const_iterator lhs, Processes::const_iterator rhs) {
return lhs.equals(rhs);
}
inline bool operator!=(Processes::const_iterator lhs, Processes::const_iterator rhs) {
return !operator==(lhs, rhs);
}
Usage:
int main() {
const Processes processes( TH32CS_SNAPALL, 0 );
for ( const Process & p : processes )
// ...
return 0;
}
You could use the named constructor idiom.
class ProcessIterator
private:
ProcessIterator(int) //begin iterator
ProcessIterator(char) //end iterator
//no default constructor, to prevent mistakes
public:
friend ProcessIterator begin() {return ProcessIterator(0);}
friend ProcessIterator end() {return ProcessIterator('\0');}
}
int main() {
for(auto it=ProcessIterator::begin(); it!=ProcessIterator::end(); ++it)
//stuff
}