C++ custom lazy iterator - c++

I have a somewhat simple text file parser. The text I parse is split into blocks denoted by { block data }.
My parser has a string read() function, which gets tokens back, such that in the example above the first token is { followed by block followed by data followed by }.
To make things less repetitive, I want to write a generator-like iterator that will allow me to write something similar to this JavaScript code:
* readBlock() {
this.read(); // {
let token = this.read();
while (token !== '}') {
yield token;
token = this.read();
}
}
which in turn allows me to use simple for-of syntax:
for (let token of parser.readBlock()) {
// block
// data
}
For C++ I would like something similar:
for (string token : reader.read_block())
{
// block
// data
}
I googled around to see if this can be done with an iterator, but I couldn't figure if I can have a lazy iterator like this which has no defined beginning or end. That is, its beginning is the current position of the reader (an integer offset into a vector of characters), and its end is when the token } is found.
I don't need to construct arbitrary iterators, or to iterate in reverse, or to see if two iterators are equal, since it's purely to make linear iteration less repetitive.
Currently every time I want to read a block, I need to re-write the following:
stream.skip(); // {
while ((token = stream.read()) != "}")
{
// block
// data
}
This becomes very messy, especially when I have blocks inside blocks. To support blocks inside blocks, the iterators would have to all reference the same reader's offset, such that an inner block will advance the offset, and the outer block will re-start iterating (after the inner is finished) from that advanced offset.
Is this possible to achieve in C++?

In order to be usable in a for-range loop, a class has to have member functions begin() and end() which return iterators.
What is an iterator? Any object fulfilling a set of requirements. There are several kind of iterators, depending on which operations allow you. I suggest to implement an input iterator, which is the simplest: https://en.cppreference.com/w/cpp/named_req/InputIterator
class Stream
{
public:
std::string read() { /**/ }
bool valid() const { /* return true while more tokens are available */ }
};
class FileParser
{
std::string current_;
Stream* stream_;
public:
class iterator
{
FileParser* obj_;
public:
using value_type = std::string;
using reference = const std::string&;
using pointer = const std::string*;
using iterator_category = std::input_iterator_tag;
iterator(FileParser* obj=nullptr): obj_ {obj} {}
reference operator*() const { return obj_->current_; }
iterator& operator++() { increment(); return *this; }
iterator operator++(int) { increment(); return *this; }
bool operator==(iterator rhs) const { return obj_ == rhs.obj_; }
bool operator!=(iterator rhs) const { return !(rhs==*this); }
protected:
void increment()
{
obj_->next();
if (!obj_->valid())
obj_ = nullptr;
}
};
FileParser(Stream& stream): stream_ {&stream} {};
iterator begin() { return iterator{this}; }
iterator end() { return iterator{}; }
void next() { current_ = stream_->read(); }
bool valid() const { return stream_->valid(); }
};
So your end-of-file iterator is represented by an iterator pointing to no object.
Then you can use it like this:
int main()
{
Stream s; // Initialize it as needed
FileParser parser {s};
for (const std::string& token: parser)
{
std::cout << token << std::endl;
}
}

Related

C++ - Overloading of operators needed for an iterator

I'm trying to create an iterator on a library that allows reading a specific file format.
From the docs, to read the file content you need do something like this:
CKMCFile database;
if (!database.OpenForListing(path)) {
std::cerr << "ERROR: unable to open " << path << std::endl;
}
CKMCFileInfo info;
database.Info(info);
CKmerAPI kmer(info.kmer_length);
uint32 cnt;
std::vector<uint64_t> data;
std::vector<uint64> ulong_kmer;
data.reserve(info.total_kmers);
while (database.ReadNextKmer(kmer, cnt)) {
kmer.to_long(ulong_kmer);
data.push_back(ulong_kmer[0]);
}
Now, I started with this class wrapper:
class FileWrapper {
CKMCFile database;
CKMCFileInfo info;
Iterator _end;
public:
explicit FileWrapper(const std::string &path) {
if (!database.OpenForListing(path)) {
std::cout << "ERROR: unable to open " << path << std::endl;
}
database.Info(info);
}
Iterator begin() {
Iterator it;
it.database = &database;
it.total = 0;
uint32_t cnt;
std::vector<uint64_t> ulong_kmer;
CKmerAPI tmp(info.kmer_length);
database.ReadNextKmer(tmp, cnt);
tmp.to_long(ulong_kmer);
return it;
}
Iterator end() const { return _end; }
uint64_t size() { return info.total_kmers; }
};
And then, this is the Iterator class:
class Iterator {
friend class FileWrapper;
CKMCFileInfo info;
CKMCFile *database;
uint64_t kmer, total;
public:
Iterator &operator++() {
++total;
uint32_t cnt;
std::vector<uint64_t> ulong_kmer;
CKmerAPI tmp(info.kmer_length);
database->ReadNextKmer(tmp, cnt);
tmp.to_long(ulong_kmer);
return *this;
}
bool operator<(const Iterator &rhs) const { return total < rhs.total; }
uint64_t operator*() const { return kmer; }
};
But, during some test I can't use into a for loop for something like for (auto it = begin(); it != end(); ++i) { ... } or begin() + size(). How can I overload correctly this two operatos? opeartor!= and operato+
You'll have to think about 2 major things before:
Ownership. Currently, you have to make sure your FileWrapper survives at least as long as any Iterator returned from it by calling its begin() (since your Iterators store pointers to data owned by the FileWrapper object). If you cannot guarantee that, maybe think about using unique_ptrs or shared_ptrs
Iterator Category. As discussed in the comments, it appears that your database requires you to use "input iterators". They can only be incremented by one (do not provide operator+(int)) and dereferenced. Indeed, what would the iterator begin() + 10 look like? If this should advance your file-pointer, then you cannot define the end as begin() + size() as that would just skip through the file.
Representation. What should an end-iterator look like? A simple choice might be to indicate the end with database == nullptr. In this case, an operator!= might look like this:
bool is_end() const { return database == nullptr; }
bool operator==(const Iterator& other) const {
if(is_end()) return other.is_end();
if(other.is_end()) return false;
return (database == other.database) && (total == other.total);
}
bool operator!=(const Iterator& other) const { return !operator==(other); }
Now, you'll need code that ensures that all end-iterators have database == nullptr and, whenever a non-end iterator becomes and end-iterator by application of operator++(), you'll need to set database = nullptr and total = 0 (or something).
A note at the end: your Iterators may be in an inconsistent state after construction and before assignment of their database member. It is prudent to declare a proper constructor for Iterator that initializes its members.
EDIT: here's a suggestion for an integration

Usecase for weak_ptr in C++ primer 5th Edition

Hi i am reading C++ primer 5th addition and have some doubts in the section of weak_ptr. It is written that
By using a weak_ptr, we don’t affect the lifetime of the vector to which a given StrBlob points. However, we can prevent the user from attempting to access a vector that no longer exists.
Then they have given the following code as an example:
#include<iostream>
#include<string>
#include<vector>
#include<memory>
#include<initializer_list>
using namespace std;
class StrBlobPtr;
class StrBlob {
friend class StrBlobPtr;
public:
typedef std::vector<std::string>::size_type size_type;
StrBlob():data(std::make_shared<std::vector<std::string>>()){
}
StrBlob(std::initializer_list<std::string> il):data(make_shared<vector<std::string>>(il)){
}
size_type size() const {
return data->size();
}
bool empty() const {
return data->empty();
}
void push_back(const std::string &t){
data->push_back(t);
}
std::string& front(){
check(0,"front on empty StrBlob");
return data->front();
}
std::string& front() const{
check(0,"front on const empty StrBlob");
return data->front();
}
std::string& back(){
check(0,"back on empty StrBlob");
return data->back();
}
std::string& back() const {
check(0,"back on const empty StrBlob");
return data->back();
}
void pop_back(){
check(0,"pop_back on empty StrBlob");
data->pop_back();
}
private:
std::shared_ptr<std::vector<std::string>> data;
void check(size_type i, const std::string &msg) const{
if(i >= data->size()){
throw out_of_range(msg);
}
}
StrBlobPtr begin();
StrBlobPtr end();
};
class StrBlobPtr {
public:
typedef std::vector<std::string>::size_type size_type;
StrBlobPtr():curr(0){
}
StrBlobPtr(StrBlob &a, size_type sz = 0):wptr(a.data), curr(sz){
}
std::string& deref() const {
auto p = check(curr, "dereference past end");
return (*p)[curr];
}
StrBlobPtr& incr(){
check(curr, "increment past end of StrBlobPtr");
++curr;
return *this;
}
std::shared_ptr<std::vector<std::string>> check(std::size_t i, const std::string &msg) const{
auto ret = wptr.lock();
if(!ret){
throw std::runtime_error("unbound StrBlobPtr");
}
if(i>= ret->size()){
throw std::out_of_range(msg);
}
return ret;
}
private:
std::weak_ptr<std::vector<std::string>> wptr;
size_type curr;
};
StrBlobPtr StrBlob::begin() {
return StrBlobPtr(*this);
}
StrBlobPtr StrBlob::end() {
auto ret = StrBlobPtr(*this, data->size());
}
int main(){
return 0;
}
My questions are as follows:
How can we prevent the user from attempting to access a vector that no longer exists? I can't come up with a use case,how can we use the above quoted statement in this example?
How does this example shows/verifies that we can prevent the user from attempting to access a vector that no longer exists? *If this example does not shows what they have written then why is this example there in the book?*Note that i have written if.
1. How can we prevent the user from attempting to access a vector that no longer exists?
We can prevent it by exchanging a weak_ptr for a shared_ptr. weak_ptr::lock() does that. It atomically checks if the pointed-to object still exists and increments the corresponding shared_ptr ref count, thus "blocking" any possible deletion from that point on.
So after this line:
auto ret = wptr.lock();
ret will be a shared_ptr that either owns the object or doesn't, and that fact will not change for as long as ret exists.
Then with a simple test you can safely check if there is an object or not:
if(!ret){
/* no object anymore */
}
At the end the function does return ret;, which returns a copy of it, thus still preventing an object from being deleted (ref count is again incremented and then decremented). So as long as you own an instance of shared_ptr, you can rest assured the object will continue to exist.
However, here we have a problem:
std::string& deref() const {
auto p = check(curr, "dereference past end");
return (*p)[curr];
}
This returns a reference to std::string inside a vector which, after p goes out of scope is held only by weak_ptr, i.e. a potentially dangling reference (which is no different from a dangling pointer).
2. How does this example shows/verifies that we can prevent the user from attempting to access a vector that no longer exists?
Apparently it doesn't. Just ignore it.

C++ OOP: Class knows its index in the container - prevent overwrite?

I have a class idx_aware that goes into a container container, which wraps around a std::vector. When the class is added to container, container sets a pointer to itself in idx_aware, as well as the index of idx_aware in its internal memory storage.
The index is not going to change until the container is destroyed or idx_aware is removed; idx_aware needs to know about its container and its index, because it has some methods that require both to work.
Now this introduces the following problem: when I get a non-const reference to an idx_aware class contained in container, I could assign to it another idx_aware class, which could have a different index. The intention would be assigning all the fields and keeping the index as it is.
#include <vector>
#include <limits>
#include <iostream>
class container;
// Stores a std::size_t field, which can be set only by subclasses.
class with_idx {
std::size_t _i;
public:
with_idx() : _i(std::numeric_limits<std::size_t>::max()) {}
operator std::size_t() const { return _i; }
protected:
void set_idx(std::size_t i) { _i = i; }
};
// Knows its index and its container
class idx_aware : public with_idx {
container const *_container;
int _some_field1;
float _some_field2;
public:
void foo() {
// Do stuff using _container and _i
}
private:
friend class container;
};
// Wraps around a std::vector
class container {
std::vector<idx_aware> _data;
public:
idx_aware &operator[](std::size_t idx) {
// Need non-const access to call foo
return _data[idx];
}
idx_aware const &operator[](std::size_t idx) const {
return _data[idx];
}
std::size_t add(idx_aware const &item) {
// Here it could potentially reuse a freed position
std::size_t free_slot = _data.size();
// Ensure _data is big enough to contain free_slot
if (_data.size() <= free_slot) {
_data.resize(free_slot + 1);
}
// Assign
_data[free_slot] = item;
_data[free_slot].set_idx(free_slot);
_data[free_slot]._container = this;
return free_slot;
}
};
int main() {
container c;
idx_aware an_item;
std::size_t i = c.add(an_item);
std::cout << c[i] << std::endl; // Prints 0
idx_aware another_item; // Created from somewhere else
// I want to set all the data in idx_aware, but the
// index should stay the same!
c[i] = another_item;
std::cout << c[i] << std::endl; // Prints numeric_limits<size_t>::max()
// Now container[i] is broken because it doesn't know anymore its index.
return 0;
}
One possible workaround would be to change with_idx in such a way that when set_idx is called, a flag is set that prevents assignment and copy operator to overwrite the _i property, like this:
class with_idx {
std::size_t _i;
bool _readonly;
public:
with_idx() : _i(std::numeric_limits<std::size_t>::max()), _readonly(false) {}
with_idx(with_idx const &other) : _i(other._i), _readonly(false) {}
with_idx &operator=(with_idx const &other) {
if (!_readonly) {
_i = other._i;
}
return *this;
}
operator std::size_t() const { return _i; }
protected:
void set_idx(std::size_t i) {
_i = i;
if (i != std::numeric_limits<std::size_t>::max()) {
// This has been set by someone with the right to do so,
// prevent overwriting
_readonly = true;
} else {
// Removed from the container, allow overwriting
_readonly = false;
}
}
};
This would have the consequence of returning, after assignment, a reference to an idx_aware class with unchanged index.
idx_aware &not_in_container1 = /* ... */;
idx_aware &not_in_container2 = /* ... */;
idx_aware &in_container = /* ... */;
not_in_container1 = in_container = not_in_container2;
// std::size_t(not_in_container_1) != std::size_t(not_in_container_2)
Is there a design pattern that can model this situation in a better way? My searches were not successful.
Are there other unwanted consequences of overriding the assignment operator in this way? The limitation I pointed out in the previous example does not look too "bad".
Is there an easier solution? I thought about writing some proxy object to replace the idx_aware & return type of operator[].
Experience tells that when C++ does not do what you intend, you are likely to be misusing OOP...
Robert's comment suggested me this solution.
Why would the contained object know about its container? To be able to perform actions such as foo and provide shorthand methods that otherwise would require to have access to the container.
Let's take this functionality away from the contained object; the contained object is just data payload. Instead, let's make operator[] return not the contained object, but some sort of iterator, a wrapper around the contained object, which knows the container and the index, and once dereferenced returns the actual contained object.
class was_idx_aware {
int _some_field1;
float _some_field2;
};
class container {
std::vector<idx_aware> _data;
public:
class idx_aware_wrapper {
container const *_container;
std::size_t _idx;
public:
idx_aware_wrapper(container const &c, std::size_t i)
: _container(&c)
, _idx(i)
{}
was_idx_aware const &operator*() const {
return _container->_data[_idx];
}
was_idx_aware &operator*() {
return _container->_data[_idx];
}
void foo() {
// Do stuff using _container and _idx.
}
};
idx_aware_wrapper operator[](std::size_t i) {
return idx_aware_wrapper(*this, i);
}
/* .... */
};
This allows quick access to any data in was_idx_aware, and the wrapper class can be augmented with all the methods that require interaction with the container. No need to store and keep indices up to date or override assignment operators.

vector of lists + iterator CPP

I am trying to implement insertion of a word into a chained hashtable.
The problem is I am trying to insert a object that has 2 fileds and I need access to one with an iterator. The problem seems to happen with the iterator it as the code doesn't work from the for cycle. I also overloaded the operator== in Vocabolo.cpp to make it work for my case.
I also have a problem the size of the vector, can I use a define? It seems not. Any advices please?
I declared my vector of list + iterator in the header file as :
vector<list<Vocabolo>> hash;
list<Vocabolo>::iterator it;
this is part of the class Vocabolo :
class Vocabolo {
public:
Vocabolo();
~Vocabolo();
void setVocabolo(Vocabolo);
string getVocabolo();
bool operator== (Vocabolo);
string termine;
string tipo;
};
this is the overloaded method operator==:
bool Vocabolo::operator== (Vocabolo x) {
return getVocabolo() == x.termine;
}
the method that is not working!
bool HashV::Insert(Vocabolo nuovo) {
key = this->HashUniversale(nuovo.termine);
for (it = this->hash[key].begin(); it != this->hash[key].end(); it++)
if (it->termine == nuovo.termine)
return false;
else {
hash[key].push_back(nuovo);
return true;
}
}
Consider using std::find_if instead:
auto itVoca = std::find_if(this->hash[key].begin(), this->hash[key].end(), [nuovo](const string& str)
{
return str != nuovo.termine;
});
bool found = itVoca != this->hash[key].end();
if(found ) hash[key].push_back(nuovo);
return found;

Trouble implementing a line-by-line file parser in C++

I have trouble implementing a simple file parser in C++11 which reads a file line by line and tokenizes the line. It should properly manage its resources. Usage of the parser should be like:
Parser parser;
parser.open("/path/to/file");
std::pair<int> header = parser.getHeader();
while (parser.hasNext()) {
std::vector<int> tokens = parser.getNext();
}
parser.close();
So the Parser class needs one member std::ifstream file (or std::ifstream* file?)
1) How should the constructor initialize this->file?
2) How should the open method set this->file to the input file?
3) How should the next line from the file get loaded into a string?
(Is this what you would use: std::getline(this->file, line)) ?
Can you give some advice? Ideally, could you sketch out the class as a code example.
Since the Parser is probably in a pretty useless state once you've constructed it and before you've opened the file, I would suggest having your use case look something like this:
Parser parser("/path/to/file");
std::pair<int> header = parser.getHeader();
while (parser.hasNext()) {
std::vector<int> tokens = parser.getNext();
}
parser.close();
In which case, you should use the constructor's member initialization list to initialise the file member (which, yes, should be of type std::ifstream):
Parser::Parser(std::string file_name)
: file(file_name)
{
// ...
}
If you kept the constructor and open member function separate, you could just leave the constructor as default because the file member will be default constructed giving you a file stream that is not associated with any file. You would then get Parser::open to forward the file name to std::ifstream::open, like so:
void Parser::open(std::string file_name)
{
file.open(file_name);
}
Then, yes, to read lines from the file, you want to use something similar to this:
std::string line;
while (std::getline(file, line)) {
// Do something with line
}
Good job for not falling into the trap of doing while (!file.eof()).
It can be designed in many ways.
You may ask the user to provide you a stream instead of specifying a filename.
That will be more generic and will work in all streams.
That way you should have a std::ifstream& member variable though you can have a pointer type as well but you need to do *_stream << to invoke any operator.
If you take a file, you mat construct a stream in your constructor and close it if open in destructor
Actually, there is an alternative to feeding the name of the file to Parser: you could feed it a std::istream. What's interesting in this is that this way any derived class of std::istream can be used, and thus you could feed it, for example, a std::istringstream, which makes it easier to write unit-tests.
class Parser {
public:
explicit Parser(std::istream& is);
/**/
private:
std::istream& _stream;
/**/
};
Next, comes iteration. It is not idiomatic in C++ to have a has followed by a get. std::istream supports iteration (with an input iterator), you could perfectly design your parser so it does too. This way you will have the benefit of compatibility with many STL algorithms.
class ParserIterator:
public std::iterator< std::input_iterator_tag, std::vector<int> >
{
public:
ParserIterator(): _stream(nullptr) {} // end
ParserIterator(std::istream& is): _stream(&is) { this->advance(); }
// Accessors
std::vector<int> const& operator*() const { return _vec; }
std::vector<int> const* operator->() const { return &_vec; }
bool equals(ParserIterator const& other) const {
if (_stream != other._stream) { return false; }
if (_stream == nullptr) { return true; }
return false;
}
// Modifiers
ParserIterator& operator++() { this->advance(); return *this; }
ParserIterator operator++(int) {
ParserIterator tmp(*this);
this->advance();
return tmp;
}
private:
void advance() {
assert(_stream && "cannot advance an end iterator");
_vec.clear();
std::string buffer;
if (not getline(*_stream, buffer)) {
_stream = 0; // end of story
}
// parse here
}
std::istream* _stream;
std::vector<int> _vec;
}; // class ParserIterator
inline bool operator==(ParserIterator const& left, ParserIterator const& right) {
return left.equals(right);
}
inline bool operator!= (parserIterator const& left, ParserIterator const& right) {
return not left.equals(right);
}
And with that we can augment our parser:
ParserIterator Parser::begin() const {
return ParserIterator(_stream);
}
ParserIterator Parser::end() const {
return ParserIterator();
}
I'll leave the getHeader method and the actual parsing content to you ;)