Implementing a split() function - c++

Accelerated C++, exercise 14.5 involves reimplementing a split function (which turns text input into a vector of strings). One must use store input in a std::string - like class (Str) & use the split function to return a Vec<Str>, Vec being a std::vector - like container. The Str class manages a custom pointer (Ptr) to the underlying Vec<char> data.
The Str class provides a constructor Str(i,j), in Str.h below, which constructs a Ptr to the underlying Vec
The problem arises when I try to create substrings by calling str(i,j)
I've detailed in the code where the the issues arise.
Here is a whittled-down version of the Str class (can post more code if needed):
Str.h
#include "Ptr.h"
#include "Vec.h"
class Str {
friend std::istream& operator>>(std::istream&, Str&);
public:
// define iterators
typedef char* iterator;
typedef char* const_iterator;
iterator begin() { return data->begin(); }
const_iterator begin() const { return data->begin(); }
iterator end() { return data->end(); }
const_iterator end() const { return data->end(); }
//** This is where we define a constructor for `Ptr`s to substrings **
template<class In> Str(In i, In j): data(new Vec<char>) {
std::copy(i, j, std::back_inserter(*data));
}
private:
// store a Ptr to a Vec
Ptr< Vec<char> > data;
};
Split.h
Vec<Str> split(const Str& str) {
typedef Str::const_iterator iter;
Vec<Str> ret;
iter i = str.begin();
while (i != str.end()) {
// ignore leading blanks
i = find_if(i, str.end(), not_space);
// find end of next word
iter j = find_if(i, str.end(), space);
// copy the characters in `[i,' `j)'
if (i != str.end())
ret.push_back(**substring**); // Need to create substrings here
// call to str(i,j) gives error, detailed below
i = j;
}
return ret;
}
My first thought was to use this constructor to create (pointers to) the required substrings. Calling str(i,j) here gives the error message
type 'const Str' does not provide a call operator
It appears as if one cannot simply call str(i,j) here. Why not?
Could a solution be to write a Str member function which is similar to substr?

Related

Is there a standard C++ iterator for C strings?

Sometimes I need to pass a C string to a function using the common C++ iterator range interface [first, last). Is there a standard C++ iterator class for those cases, or a standard way of doing it without having to copy the string or call strlen()?
EDIT:
I know I can use a pointer as an iterator, but I would have to know where the string ends, what would require me to call strlen().
EDIT2:
While I didn't know if such iterator is standardized, I certainly know it is possible. Responding to the sarcastic answers and comments, this is the stub (incomplete, untested):
class CStringIterator
{
public:
CStringIterator(char *str=nullptr):
ptr(str)
{}
bool operator==(const CStringIterator& other) const
{
if(other.ptr) {
return ptr == other.ptr;
} else {
return !*ptr;
}
}
/* ... operator++ and other iterator stuff */
private:
char *ptr;
};
EDIT3:
Specifically, I am interested in a forward iterator, because I want to avoid to iterate over the sring twice, when I know the algorithm will only have to do it once.
There isn't any explicit iterator class, but regular raw pointers are valid iterators as well. Problem with C-strings, though, is that they do not come with a native end iterator, which makes them unusable in range based for loops – directly at least...
You might like to try the following template, though:
template <typename T>
class Range
{
T* b;
public:
class Sentinel
{
friend class Range;
Sentinel() { }
friend bool operator!=(T* t, Sentinel) { return *t; }
public:
Sentinel(Sentinel const& o) { }
};
Range(T* begin)
: b(begin)
{ }
T* begin() { return b; }
Sentinel end() { return Sentinel(); }
};
Usage:
for(auto c : Range<char const>("hello world"))
{
std::cout << c << std::endl;
}
It originally was designed to iterate over null-terminated argv of main, but works with any pointer to null terminated array – which a C-string is as well...
Secret is comparing against the sentinel, which actually does a totally different comparison (current pointer pointing the terminating null (pointer))...
Edit: Pre-C++17 variant:
template <typename T>
class Range
{
T* b;
public:
class Wrapper
{
friend class Range;
T* t;
Wrapper(T* t) : t(t) { }
public:
Wrapper(Wrapper const& o) : t(o.t) { }
Wrapper operator++() { ++t; return *this; }
bool operator!=(Wrapper const& o) const { return *t; }
T operator*() { return *t; }
};
Range(T* begin)
: b(begin)
{ }
Wrapper begin() { return Wrapper(b); }
Wrapper end() { return Wrapper(nullptr); }
};
Actually, yes - sort of. In c++17.
C++17 introduces std::string_view which can be constructed from a c-style string.
std::string_view is a random access (proxy) container which of course fully supports iterators.
Note that although constructing a string_view from a const char* will theoretically call std::strlen, the compiler is allowed to (and gcc certainly does) elide the call when it knows the length of the string at compile time.
Example:
#include <string_view>
#include <iostream>
template<class Pointer>
struct pointer_span
{
using iterator = Pointer;
pointer_span(iterator first, std::size_t size)
: begin_(first)
, end_(first + size)
{
}
iterator begin() const { return begin_; }
iterator end() const { return end_; }
iterator begin_, end_;
};
int main(int argc, char** argv)
{
for(auto&& ztr : pointer_span(argv, argc))
{
const char* sep = "";
for (auto ch : std::string_view(ztr))
{
std::cout << sep << ch;
sep = " ";
}
std::cout << std::endl;
}
}
See the example output here
Is there a standard C++ iterator for C strings?
Yes. A pointer is an iterator for an array. C strings are (null terminated) arrays of char. Therefore char* is an iterator for a C string.
... using the common C++ iterator range interface [first, last)
Just like with all other iterators, to have a range, you need to have an end iterator.
If you know or can assume that an array fully contains the string and nothing more, then you can get the iterator range in constant time using std::begin(arr) (std::begin is redundant for C arrays which decay to the pointer anyway, but nice for symmetry) and std::end(arr) - 1. Otherwise you can use pointer arithmetic with offsets within the array.
A little bit of care must be taken to account for the null terminator. One must remember that the full range of the array contains the null terminator of the string. If you want the iterator range to represent the string without the terminator, then subtract one from the end iterator of the array, which explains the subtraction in the previous paragraph.
If you don't have an array, but only a pointer - the begin iterator - you can get the end iterator by advancing the beginning by the length of the string. This advancement is a constant operation, because pointers are random access iterators. If you don't know the length, you can call std::strlen to find out (which isn't a constant operation).
Example, std::sort accepts a range of iterators. You can sort a C string like this:
char str[] = "Hello World!";
std::sort(std::begin(str), std::end(str) - 1);
for(char c : "test"); // range-for-loops work as well, but this includes NUL
In the case you don't know the length of the string:
char *str = get_me_some_string();
std::sort(str, str + std::strlen(str));
Specifically, I am interested in a forward iterator
A pointer is a random access iterator. All random access iterators are also forward iterators. A pointer meets all of the requirements listed in the linked iterator concept.
It is possible to write such iterator, something like this should work:
struct csforward_iterator :
std::iterator<std::bidirectional_iterator_tag, const char, void> {
csforward_iterator( pointer ptr = nullptr ) : p( ptr ) {}
csforward_iterator& operator++() { ++p; return *this; }
csforward_iterator operator++(int) { auto t = *this; ++p; return t; }
csforward_iterator& operator--() { --p; return *this; }
csforward_iterator operator--(int) { auto t = *this; --p; return t; }
bool operator==( csforward_iterator o ) {
return p == o.p or ( p ? not ( o.p or *p ) : not *o.p );
}
bool operator!=( csforward_iterator o ) { return not operator==( o ); }
void swap( csforward_iterator &o ) { std::swap( p, o.p ); }
reference operator*() const { return *p; }
pointer operator->() const { return p; }
private:
pointer p;
};
live example
though unfortunately standard one is not provided and it probably would be template over char type (like std::string ).
I'm afraid not, for last you'll need a pointer to the end of the string for which you'll need to call strlen.
If you have a string literal, you can get the end iterator without using std::strlen. If you have only a char*, you'll have to write your own iterator class or rely on std::strlen to get the end iterator.
Demonstrative code for string literals:
#include <iostream>
#include <utility>
template <typename T, size_t N>
std::pair<T*, T*> array_iterators(T (&a)[N]) { return std::make_pair(&a[0], &a[0]+N); }
int main()
{
auto iterators = array_iterators("This is a string.");
// The second of the iterators points one character past the terminating
// null character. To iterate over the characters of the string, we need to
// stop at the terminating null character.
for ( auto it = iterators.first; it != iterators.second-1; ++it )
{
std::cout << *it << std::endl;
}
}
For ultimate safety and flexibility, you end up wrapping the iterator, and it has to carry some state.
Issues include:
random access - which can be addressed in a wrapped pointer by limiting its overloads to block random access, or by making it strlen() on need
multiple iterators - when comparing with each other, not end
decrementing end - which you could again "fix" by limiting the overloads
begin() and end() need to be same type - in c++11 and some api calls.
a non-const iterator could add or remove content
Note that it is "not the iterator's problem" if it is randomly seeked outside the range of the container, and it can legally seek past a string_view.end(). It is also fairly standard that such a broken iterator could not then increment to end() any more.
The most painful of these conditions is that end can be decremented, or subtracted, and dereferenced (usually you can't, but for string it is a null character). This means the end object needs a flag that it is the end, and the address of the start, so that it can find the actual end using strlen() if either of these operations occurs.
Is there a standard C++ iterator class for those cases, or a standard way of doing it without having to copy the string
Iterators are a generalization of pointers. Specifically, they're designed so that pointers are valid iterators.
Note the pointer specializations of std::iterator_traits.
I know I can use a pointer as an iterator, but I would have to know where the string ends
Unless you have some other way to know where the string ends, calling strlen is the best you can do. If there were a magic iterator wrapper, it would also have to call strlen.
Sorry, an iterator is something that is normally obtained from an iterable instance. As char * is a basic type and not a class anymore. How do you think something like .begin() or .end(), can be achieved.
By the way, if you need to iterate a char *p knowing it is nul terminated. you just can do the following.
for( char *p = your_string; *p; ++p ) {
...
}
but the thing is that you cannot use iterators as they are defined in C++, because char * is a basic type, has no constructor, has no destructor or methods associated.

Proxy objects in iterators

I have a big vector of items that belong to a certain class.
struct item {
int class_id;
//some other data...
};
The same class_id can appear multiple times in the vector, and the vector is constructed once and then sorted by class_id. So all elements of the same class are next to each other in the vector.
I later have to process the items per class, ie. I update all items of the same class but I do not modify any item of a different class. Since I have to do this for all items and the code is trivially parallelizable I wanted to use Microsoft PPL with Concurrency::parallel_for_each(). Therefore I needed an iterator and came up with a forward iterator that returns the range of all items with a certain class_id as proxy object. The proxy is simply a std::pair and the proxy is the iterator's value type.
using item_iterator = std::vector<item>::iterator;
using class_range = std::pair<item_iterator, item_iterator>;
//iterator definition
class per_class_iterator : public std::iterator<std::forward_iterator_tag, class_range> { /* ... */ };
By now I was able to loop over all my classes and update the items like this.
std::vector<item> items;
//per_class_* returns a per_class_iterator
std::for_each(items.per_class_begin(), items.per_class_end(),
[](class_range r)
{
//do something for all items in r
std::for_each(r.first, r.second, /* some work */);
});
When replacing std::for_each with Concurrency::parallel_for_each the code crashed. After debugging I found the problem to be the following code in _Parallel_for_each_helper in ppl.h at line 2772 ff.
// Add a batch of work items to this functor's array
for (unsigned int _Index=0; (_Index < _Size) && (_First != _Last); _Index++)
{
_M_element[_M_len++] = &(*_First++);
}
It uses postincrement (so a temporary iterator is returned), dereferences that temporary iterator and takes the address of the dereferenced item. This only works if the item returned by dereferencing a temporary object survives, ie. basically if it points directly into the container. So fixing this is easy, albeit the per class std::for_each work loop has to be replaced with a for-loop.
//it := iterator somewhere into the vector of items (item_iterator)
for(const auto cur_class = it->class_id; cur_class == it->class_id; ++it)
{
/* some work */
}
My question is if returning proxy objects the way I did is violating the standard or if the assumption that every iterator dereferences into permanent data has been made by Microsoft for their library, but is not documented. At least I could not find any documentation on the iterator requirements for parallel_for_each() except that either a random access or a forward iterator are expected. I have seen the question about forward iterators and vector but since my iterator's reference type is const value_type& I still think my iterator is ok by the standard. So is a forward iterator returning a proxy object still a valid forward iterator? Or put another way, is it ok for an iterator to have a value type different from a type that is actually stored somewhere in a container?
Compilable example:
#include <vector>
#include <utility>
#include <cassert>
#include <iterator>
#include <memory>
#include <algorithm>
#include <iostream>
#include <ppl.h>
using identifier = int;
struct item
{
identifier class_id;
// other data members
// ...
bool operator<(const item &rhs) const
{
return class_id < rhs.class_id;
}
bool operator==(const item &rhs) const
{
return class_id == rhs.class_id;
}
//inverse operators omitted
};
using container = std::vector<item>;
using item_iterator = typename container::iterator;
using class_range = std::pair<item_iterator, item_iterator>;
class per_class_iterator : public std::iterator<std::forward_iterator_tag, class_range>
{
public:
per_class_iterator() = default;
per_class_iterator(const per_class_iterator&) = default;
per_class_iterator& operator=(const per_class_iterator&) = default;
explicit per_class_iterator(container &data) :
data_(std::addressof(data)),
class_(equal_range(data_->front())), //this would crash for an empty container. assume it's not.
next_(class_.second)
{
assert(!data_->empty()); //a little late here
assert(std::is_sorted(std::cbegin(*data_), std::cend(*data_)));
}
reference operator*()
{
//if data_ is unset the iterator is an end iterator. dereferencing end iterators is bad.
assert(data_ != nullptr);
return class_;
}
per_class_iterator& operator++()
{
assert(data_ != nullptr);
//if we are at the end of our data
if(next_ == data_->end())
{
//reset the data pointer, ie. make iterator an end iterator
data_ = nullptr;
}
else
{
//set to the class of the next element
class_ = equal_range(*next_);
//and update the next_ iterator
next_ = class_.second;
}
return *this;
}
per_class_iterator operator++(int)
{
per_class_iterator tmp{*this};
++(*this);
return tmp;
}
bool operator!=(const per_class_iterator &rhs) const noexcept
{
return (data_ != rhs.data_) ||
(data_ != nullptr && rhs.data_ != nullptr && next_ != rhs.next_);
}
bool operator==(const per_class_iterator &rhs) const noexcept
{
return !(*this != rhs);
}
private:
class_range equal_range(const item &i) const
{
return std::equal_range(data_->begin(), data_->end(), i);
}
container* data_ = nullptr;
class_range class_;
item_iterator next_;
};
per_class_iterator per_class_begin(container &c)
{
return per_class_iterator{c};
}
per_class_iterator per_class_end()
{
return per_class_iterator{};
}
int main()
{
std::vector<item> items;
items.push_back({1});
items.push_back({1});
items.push_back({3});
items.push_back({3});
items.push_back({3});
items.push_back({5});
//items are already sorted
//#define USE_PPL
#ifdef USE_PPL
Concurrency::parallel_for_each(per_class_begin(items), per_class_end(),
#else
std::for_each(per_class_begin(items), per_class_end(),
#endif
[](class_range r)
{
//this loop *cannot* be parallelized trivially
std::for_each(r.first, r.second,
[](item &i)
{
//update item (by evaluating all other items of the same class) ...
//building big temporary data structure for all items of same class ...
//i.processed = true;
std::cout << "item: " << i.class_id << '\n';
});
});
return 0;
}
When you're writing a proxy iterator, the reference type should be a class type, precisely because it can outlive the iterator it is derived from. So, for a proxy iterator, when instantiating the std::iterator base should specify the Reference template parameter as a class type, typically the same as the value type:
class per_class_iterator : public std::iterator<
std::forward_iterator_tag, class_range, std::ptrdiff_t, class_range*, class_range>
~~~~~~~~~~~
Unfortunately, PPL is not keen on proxy iterators and will break compilation:
ppl.h(2775): error C2338: lvalue required for forward iterator operator *
ppl.h(2772): note: while compiling class template member function 'Concurrency::_Parallel_for_each_helper<_Forward_iterator,_Function,1024>::_Parallel_for_each_helper(_Forward_iterator &,const _Forward_iterator &,const _Function &)'
with
[
_Forward_iterator=per_class_iterator,
_Function=main::<lambda_051d98a8248e9970abb917607d5bafc6>
]
This is actually a static_assert:
static_assert(std::is_lvalue_reference<decltype(*_First)>::value, "lvalue required for forward iterator operator *");
This is because the enclosing class _Parallel_for_each_helper stores an array of pointers and expects to be able to indirect them later:
typename std::iterator_traits<_Forward_iterator>::pointer _M_element[_Size];
Since PPL doesn't check that pointer is actually a pointer, we can exploit this by supplying a proxy pointer with an operator* and overloading class_range::operator&:
struct class_range_ptr;
struct class_range : std::pair<item_iterator, item_iterator> {
using std::pair<item_iterator, item_iterator>::pair;
class_range_ptr operator&();
};
struct class_range_ptr {
class_range range;
class_range& operator*() { return range; }
class_range const& operator*() const { return range; }
};
inline class_range_ptr class_range::operator&() { return{*this}; }
class per_class_iterator : public std::iterator<
std::forward_iterator_tag, class_range, std::ptrdiff_t, class_range_ptr, class_range&>
{
// ...
This works great:
item: item: 5
1
item: 3item: 1
item: 3
item: 3
Press any key to continue . . .
For your direct question, no, iterator does not have to be something which is related to any kind of container. About only requirements for an iterator are for it to be:
be copy-constructible, copy-assignable and destructible
support equality/inequality
be dereferencable
Iterator does not necessarily has to be tied to a particular container (see generators), and so it cannot be said that "it has to has same type as container" - because there is no container in generic case.
It seems, hovever, having a custom iterator class may be actually an overkill in your case. Here's why:
In C++, array/vector end iterator is and iterator pointing just behind the end of the last item.
Given a vector of objects of "classes" (in your definition) A,B,C, etc., filled like following:
AAAAAAABBBBBBBBBBBBCCCCCCCD.......
You can just take regular vector iterators that will act as your range starts and ends:
AAAAAAABBBBBBBBBBBBCCCCCCCD......Z
^ ^ ^ ^ ^
i1 i2 i3 i4 iN
For the 4 iterators you see here, following is true:
i1 is begin iterator for class A
i2 is end iterator for class A and begin iterator for class B
i3 is end iterator for class B and begin iterator for class C etc.
Hence, for each class you can have a pair of iterators which are start and end of the respective class range.
Hence, your processing is as trivial as:
for(auto it = i1; i!= i2; i++) processA(*it);
for(auto it = i2; i!= i3; i++) processB(*it);
for(auto it = i3; i!= i4; i++) processC(*it);
Each loop being trivially parallelizable.
parallel_for_each (i1; i2; processA);
parallel_for_each (i2; i3; processB);
parallel_for_each (i3; i4; processC);
To use a range-based for, you can introduce a substitute range class:
class vector_range<T> {
public:
vector<T>::const_iterator begin() {return _begin;};
vector<T>::const_iterator end() {return _end;};
// Trivial constructor filling _begin and _end fields
}
That is to say, you don't really need a proxy iterators to parallelize loops - the way C++ iterators are done is already ideally covers your case.

Iterating in a sorted manner over a std::vector<std::pair<T,U> > object

I am reading a object from a database of type Foo, as defined below. This object is a vector of Foo Members, where a Foo Members consists of a string id and a container object.
typedef std::pair<std::string, Container> FooMember;
typedef std::vector<FooMember> Foo;
I wish to iterate over a Foo object in its sorted form, where sorting is done with respect to the id. To do this I am using the following function to create first a sorted version of the object. As you can see, the object is sorted in a case insensitive manner. Is there a better way for me to iterate over this object compared to how I am currently doing it?
Foo sortedFoo(Foo& value) const {
Foo returnValue;
returnValue.reserve(value.size());
// use a map to sort the items
std::map<std::string, FooMember> sortedMembers;
{
Foo::iterator i = value.begin();
Foo::iterator end = value.end();
for(; i!=end; ++i) {
std::string name = i->first;
boost::algorithm::to_lower(name);
sortedMembers[name] = *i;
}
}
// convert the map to a vector of its values in sorted order
std::map<std::string, FooMember >::iterator i = sortedMembers.begin();
std::map<std::string, FooMember >::iterator end = sortedMembers.end();
for(; i!=end; ++i) {
returnValue.push_back(i->second);
}
return returnValue;
}
Yes: Copy the vector, then use std::sort with a custom comparison predicate:
struct ByIdCaseInsensitive {
bool operator ()(const FooMember& lhs, const FooMember& rhs) const {
return boost::algorithm::to_lower_copy(lhs.first) <
boost::algorithm::to_lower_copy(rhs.first);
}
};
Way more efficient than filling a map, and then copying back to a vector.
The predicate would be even better if it used a proper Unicode collation algorithm, but that isn't available in the standard library or Boost.
You can use std::sort
#include <algorithm>
bool comparator(const FooMember& i, const FooMember& j)
{
std::string str1 = i.first;
boost::algorithm::to_lower(str1);
std::string str2 = j.first;
boost::algorithm::to_lower(str2);
return (str1 < str2);
}
void sortFoo(Foo& value) {
std::sort (value.begin(), value.end(), comparator);
}
Or, you can keep Foo objects in a std::map<std::string, Foo> from the beginning so they remain always sorted.
The best way would be to use std::sort with a custom comparator for FooMembers:
bool cmp(const FooMember& lhs, const FooMember& rhs);
Foo sortedFoo(const Foo& value) const
{
Foo tmp = value;
return std::sort(tmp.begin(), tmp.end(), cmp);
}
where the comparison can be implemented with the help of std::lexicographical_compare and tolower:
#include <cctype> // for std::tolower
bool ci_cmp(char a, char b)
{
return std::tolower(a) < std::tolower(b);
}
#include <algorithm> // for std::sort, std::lexicographical_compare
bool cmp(const FooMember& lhs, const FooMember& rhs)
{
return std::lexicographical_compare(lhs.first.begin(),
lhs.first.end(),
rhs.first.begin(),
rhs.first.end(),
ci_cmp);
}
You can also use std::sort with a lambda expression:
std::sort(value.begin(), value.end(), [](const FooMember &lhs, const FooMember &rhs)
{
std::string str1 = i.first, str2 = j.first;
boost::algorithm::to_lower(str1);
boost::algorithm::to_lower(str2);
return str1 < str2;
});
Or use the version provided by erelender. It's up to you.
Semantically std::vector<std::pair<T,U> > is a std::map<T,U> (but implementations are usually different). If you can re-design Foo, you probably better do it. As side effect, you will get sorting for free.
typedef std::map<std::string, Container> Foo;

How does this printing code work?

I found this code somewhere. It prints "abcd" to the screen but in a weird way. I would like someone to tell me how it works:
#include <iostream>
#include <sstream>
class X
{
typedef std::istreambuf_iterator<char> Iter;
Iter it;
public:
X(std::streambuf* p) : it(p) { }
Iter begin()
{ return it; }
Iter end()
{ return Iter(); }
};
void printbuf(X x, std::ostreambuf_iterator<char> it)
{
for (auto c : x)
{
*it = c;
}
}
int main()
{
std::stringbuf buf("abcd", std::ios_base::in | std::ios_base::out);
printbuf(&buf, std::cout);
}
We have a class X which encapsulates an istreambuf_iterator<char>. This is an iterator type which allows us to treat a stream buffer as an iterator range for standard algorithms.
class X
{
typedef std::istreambuf_iterator<char> Iter;
Iter it;
public:
The class is constructible from a pointer to a stream buffer instance.
X(std::streambuf* p) : it(p) { }
It exposes begin() and end() member functions to allow it to be used with the range-based for loop.
Iter begin()
{ return it; }
Iter end()
{ return Iter(); }
};
printbuf() is a function which accepts an instance of our range class X, as well as an ostreambuf_iterator<char>, which—you guessed it—allows us to use an output stream buffer as an output iterator.
void printbuf(X x, std::ostreambuf_iterator<char> it)
{
So we iterate over every character in the input range.
for (auto c : x)
{
If you haven’t dealt with output iterators before, you can think of them as an object resembling a pointer, to which you write values using dereference and assignment. back_insert_iterator is a commonly used output iterator, for building containers—you usually construct it using back_inserter. But I digress.
We copy each character to the output iterator.
*it = c;
}
}
int main()
{
Here we construct a string buffer, which is both an input and output stream buffer. We only use the input capability in this example.
std::stringbuf buf("abcd", std::ios_base::in | std::ios_base::out);
Now we use an implicitly-constructed X instance to treat the string buffer as an iterator range. Then we copy that range to an output stream buffer iterator—also implicitly constructed—to std::cout.
printbuf(&buf, std::cout);
}
The effect is that we’re looping over each character in the buffer and copying it to standard output.
printbuf(&buf, std::cout);
Passing std::stringbuf* as the first paramter causes an implicit construction of X to match printbuf()
And also for the second parameter, an implicit
contruction occurs, an instance of std::ostreambuf_iterator<char> is created from std::cout (std::ostream)
void printbuf(X x, std::ostreambuf_iterator<char> it)
{
for (auto c : x)
{
*it = c;
}
}
In printbuf, the foreach loop (range based for loop) uses X::begin() and X::end() to loop over all characters in that wrapped std::stringbuf and writes them to std::cout via the std::ostreambuf_iterator (it)

Using C-style arrays as backend for STL string operations

I'm writing a library to read some specific file formats. The file are being read with memory mapped files (boost::interprocess templates). On these files I have to do some searches with std::regex. To avoid unnecessary copying I want to use the memory mapped file directly (as C-style char array).
After some research time I come up with the following two approaches:
Using the pubsetbuf method of a streambuf object
Using the char* pointer as iterator
but since the implementation of the first one is optional for the STL vendor, I'm sticked with the second approach. Since the constructor for std::string::iterator is declared as private and the whole iterator implementation seems to be also vendor specific. I wrote my own iterator:
template<typename T>
class PointerIterator: std::iterator<std::input_iterator_tag, T> {
public:
PointerIterator(T* first, std::size_t count): first_(first), last_(first + count) {}
PointerIterator(T* first, T* last): first_(first), last_(last) {}
class iterator {
public:
iterator(T* p): ptr_(p) {}
iterator(const iterator& it): ptr_(it.ptr_) {}
iterator& operator++() {
++ptr_;
return *this;
}
iterator operator++(int) {
iterator temp(*this);
++ptr_;
return temp;
}
bool operator==(const iterator& it) { return ptr_ == it.ptr_; }
bool operator!=(const iterator& it) { return ptr_ != it.ptr_; }
T& operator*() { return *ptr_; }
private:
T* ptr_;
};
iterator begin() {
return iterator(first_);
}
iterator end() {
return iterator(last_);
}
private:
T* first_;
T* last_;
};
The iterator is working, but for use with the std::regex_search method (or other char-related STL methods) it must be of the same type as the STL iterators.
Is there some generic approach to cast my iterators to the STL ones (portable over STL implementations) or achieve the entire thng with another approach I didn't mentioned?
Edit:
The source using std::regex_search:
std::regex re(...);
boost::interprocess::mapped_region region(...);
char* first = static_cast<char*>(region.get_address());
char* last = first + 5000;
// ...
PointerIterator<char> wrapper(first, last);
std::smatch match;
while (std::regex_search(wrapper.begin(), wrapper.end(), match, re)) { // Error: No matching function call to 'regex_search'
// do something
}
Thanks
The definition of std::smatch is a specialization of std::match_results. This specialization uses string::const_iterator as the iterator type in the template arguments passed to std::match_results. This requires the begin and end arguments passed to std::regex_search to also be of type string::const_iterator.
In C++ pointers satisfy the requirements of bidirectional iterators and it is not necessary to wrap them in an iterator class. If you need to search through a buffer pointed to by a char pointer you can either use std::cmatch or use std::match_results and specify the iterator type explicitly. In the following two examples I have retained the use of PointerIterator to provide solutions that directly apply to your current code base. I have also included a stand alone example you can reference in the event you want to eliminate the use of your custom iterator class.
PointerIterator<char> wrapper(first, last);
std::cmatch match; // <<--
while (std::regex_search(wrapper.begin(), wrapper.end(), match, re))
{
// do something
}
...using std::match_results instead.
PointerIterator<char> wrapper(first, last);
std::match_results<const char*> match; // <<--
while (std::regex_search(wrapper.begin(), wrapper.end(), match, re))
{
// do something
}
Below is a stand alone example that should provide a bit of codified clarification. It is based on the example on cppreference.com and uses const char* instead of std::string as the search target.
#include <regex>
#include <iostream>
int main()
{
const char *haystack = "Roses are #ff0000";
const int size = strlen(haystack);
std::regex pattern(
"#([a-f0-9]{2})"
"([a-f0-9]{2})"
"([a-f0-9]{2})");
std::cmatch results;
std::regex_search(haystack, haystack + size, results, pattern);
for (size_t i = 0; i < results.size(); ++i) {
std::csub_match sub_match = results[i];
std::string sub_match_str = sub_match.str();
std::cout << i << ": " << sub_match_str << '\n';
}
}
This produces the following output.
0: #ff0000
1: ff
2: 00
3: 00