Is there a standard C++ iterator for C strings? - c++

Sometimes I need to pass a C string to a function using the common C++ iterator range interface [first, last). Is there a standard C++ iterator class for those cases, or a standard way of doing it without having to copy the string or call strlen()?
EDIT:
I know I can use a pointer as an iterator, but I would have to know where the string ends, what would require me to call strlen().
EDIT2:
While I didn't know if such iterator is standardized, I certainly know it is possible. Responding to the sarcastic answers and comments, this is the stub (incomplete, untested):
class CStringIterator
{
public:
CStringIterator(char *str=nullptr):
ptr(str)
{}
bool operator==(const CStringIterator& other) const
{
if(other.ptr) {
return ptr == other.ptr;
} else {
return !*ptr;
}
}
/* ... operator++ and other iterator stuff */
private:
char *ptr;
};
EDIT3:
Specifically, I am interested in a forward iterator, because I want to avoid to iterate over the sring twice, when I know the algorithm will only have to do it once.

There isn't any explicit iterator class, but regular raw pointers are valid iterators as well. Problem with C-strings, though, is that they do not come with a native end iterator, which makes them unusable in range based for loops – directly at least...
You might like to try the following template, though:
template <typename T>
class Range
{
T* b;
public:
class Sentinel
{
friend class Range;
Sentinel() { }
friend bool operator!=(T* t, Sentinel) { return *t; }
public:
Sentinel(Sentinel const& o) { }
};
Range(T* begin)
: b(begin)
{ }
T* begin() { return b; }
Sentinel end() { return Sentinel(); }
};
Usage:
for(auto c : Range<char const>("hello world"))
{
std::cout << c << std::endl;
}
It originally was designed to iterate over null-terminated argv of main, but works with any pointer to null terminated array – which a C-string is as well...
Secret is comparing against the sentinel, which actually does a totally different comparison (current pointer pointing the terminating null (pointer))...
Edit: Pre-C++17 variant:
template <typename T>
class Range
{
T* b;
public:
class Wrapper
{
friend class Range;
T* t;
Wrapper(T* t) : t(t) { }
public:
Wrapper(Wrapper const& o) : t(o.t) { }
Wrapper operator++() { ++t; return *this; }
bool operator!=(Wrapper const& o) const { return *t; }
T operator*() { return *t; }
};
Range(T* begin)
: b(begin)
{ }
Wrapper begin() { return Wrapper(b); }
Wrapper end() { return Wrapper(nullptr); }
};

Actually, yes - sort of. In c++17.
C++17 introduces std::string_view which can be constructed from a c-style string.
std::string_view is a random access (proxy) container which of course fully supports iterators.
Note that although constructing a string_view from a const char* will theoretically call std::strlen, the compiler is allowed to (and gcc certainly does) elide the call when it knows the length of the string at compile time.
Example:
#include <string_view>
#include <iostream>
template<class Pointer>
struct pointer_span
{
using iterator = Pointer;
pointer_span(iterator first, std::size_t size)
: begin_(first)
, end_(first + size)
{
}
iterator begin() const { return begin_; }
iterator end() const { return end_; }
iterator begin_, end_;
};
int main(int argc, char** argv)
{
for(auto&& ztr : pointer_span(argv, argc))
{
const char* sep = "";
for (auto ch : std::string_view(ztr))
{
std::cout << sep << ch;
sep = " ";
}
std::cout << std::endl;
}
}
See the example output here

Is there a standard C++ iterator for C strings?
Yes. A pointer is an iterator for an array. C strings are (null terminated) arrays of char. Therefore char* is an iterator for a C string.
... using the common C++ iterator range interface [first, last)
Just like with all other iterators, to have a range, you need to have an end iterator.
If you know or can assume that an array fully contains the string and nothing more, then you can get the iterator range in constant time using std::begin(arr) (std::begin is redundant for C arrays which decay to the pointer anyway, but nice for symmetry) and std::end(arr) - 1. Otherwise you can use pointer arithmetic with offsets within the array.
A little bit of care must be taken to account for the null terminator. One must remember that the full range of the array contains the null terminator of the string. If you want the iterator range to represent the string without the terminator, then subtract one from the end iterator of the array, which explains the subtraction in the previous paragraph.
If you don't have an array, but only a pointer - the begin iterator - you can get the end iterator by advancing the beginning by the length of the string. This advancement is a constant operation, because pointers are random access iterators. If you don't know the length, you can call std::strlen to find out (which isn't a constant operation).
Example, std::sort accepts a range of iterators. You can sort a C string like this:
char str[] = "Hello World!";
std::sort(std::begin(str), std::end(str) - 1);
for(char c : "test"); // range-for-loops work as well, but this includes NUL
In the case you don't know the length of the string:
char *str = get_me_some_string();
std::sort(str, str + std::strlen(str));
Specifically, I am interested in a forward iterator
A pointer is a random access iterator. All random access iterators are also forward iterators. A pointer meets all of the requirements listed in the linked iterator concept.

It is possible to write such iterator, something like this should work:
struct csforward_iterator :
std::iterator<std::bidirectional_iterator_tag, const char, void> {
csforward_iterator( pointer ptr = nullptr ) : p( ptr ) {}
csforward_iterator& operator++() { ++p; return *this; }
csforward_iterator operator++(int) { auto t = *this; ++p; return t; }
csforward_iterator& operator--() { --p; return *this; }
csforward_iterator operator--(int) { auto t = *this; --p; return t; }
bool operator==( csforward_iterator o ) {
return p == o.p or ( p ? not ( o.p or *p ) : not *o.p );
}
bool operator!=( csforward_iterator o ) { return not operator==( o ); }
void swap( csforward_iterator &o ) { std::swap( p, o.p ); }
reference operator*() const { return *p; }
pointer operator->() const { return p; }
private:
pointer p;
};
live example
though unfortunately standard one is not provided and it probably would be template over char type (like std::string ).

I'm afraid not, for last you'll need a pointer to the end of the string for which you'll need to call strlen.

If you have a string literal, you can get the end iterator without using std::strlen. If you have only a char*, you'll have to write your own iterator class or rely on std::strlen to get the end iterator.
Demonstrative code for string literals:
#include <iostream>
#include <utility>
template <typename T, size_t N>
std::pair<T*, T*> array_iterators(T (&a)[N]) { return std::make_pair(&a[0], &a[0]+N); }
int main()
{
auto iterators = array_iterators("This is a string.");
// The second of the iterators points one character past the terminating
// null character. To iterate over the characters of the string, we need to
// stop at the terminating null character.
for ( auto it = iterators.first; it != iterators.second-1; ++it )
{
std::cout << *it << std::endl;
}
}

For ultimate safety and flexibility, you end up wrapping the iterator, and it has to carry some state.
Issues include:
random access - which can be addressed in a wrapped pointer by limiting its overloads to block random access, or by making it strlen() on need
multiple iterators - when comparing with each other, not end
decrementing end - which you could again "fix" by limiting the overloads
begin() and end() need to be same type - in c++11 and some api calls.
a non-const iterator could add or remove content
Note that it is "not the iterator's problem" if it is randomly seeked outside the range of the container, and it can legally seek past a string_view.end(). It is also fairly standard that such a broken iterator could not then increment to end() any more.
The most painful of these conditions is that end can be decremented, or subtracted, and dereferenced (usually you can't, but for string it is a null character). This means the end object needs a flag that it is the end, and the address of the start, so that it can find the actual end using strlen() if either of these operations occurs.

Is there a standard C++ iterator class for those cases, or a standard way of doing it without having to copy the string
Iterators are a generalization of pointers. Specifically, they're designed so that pointers are valid iterators.
Note the pointer specializations of std::iterator_traits.
I know I can use a pointer as an iterator, but I would have to know where the string ends
Unless you have some other way to know where the string ends, calling strlen is the best you can do. If there were a magic iterator wrapper, it would also have to call strlen.

Sorry, an iterator is something that is normally obtained from an iterable instance. As char * is a basic type and not a class anymore. How do you think something like .begin() or .end(), can be achieved.
By the way, if you need to iterate a char *p knowing it is nul terminated. you just can do the following.
for( char *p = your_string; *p; ++p ) {
...
}
but the thing is that you cannot use iterators as they are defined in C++, because char * is a basic type, has no constructor, has no destructor or methods associated.

Related

How does std::reverse_iterator hold one before begin?

This is a code example using std::reverse_iterator:
template<typename T, size_t SIZE>
class Stack {
T arr[SIZE];
size_t pos = 0;
public:
T pop() {
return arr[--pos];
}
Stack& push(const T& t) {
arr[pos++] = t;
return *this;
}
auto begin() {
return std::reverse_iterator(arr+pos);
}
auto end() {
return std::reverse_iterator(arr);
// ^ does reverse_iterator take this `one back`? how?
}
};
int main() {
Stack<int, 4> s;
s.push(5).push(15).push(25).push(35);
for(int val: s) {
std::cout << val << ' ';
}
}
// output is as expected: 35 25 15 5
When using std::reverse_iterator as an adaptor for another iterator, the newly adapted end shall be one before the original begin. However calling std::prev on begin is UB.
How does std::reverse_iterator hold one before begin?
Initialization of std::reverse_iterator from an iterator does not decrease the iterator upon initialization, as it would then be UB when sending begin to it (one cannot assume that std::prev(begin) is a valid call).
The trick is simple, std::reverse_iterator holds the original iterator passed to it, without modifying it. Only when it is being dereferenced it peeks back to the actual value. So in a way the iterator is pointing inside to the next element, from which it can get the current.
It would look something like:
// partial possible implementation of reverse_iterator for demo purpose
template<typename Itr>
class reverse_iterator {
Itr itr;
public:
constexpr explicit reverse_iterator(Itr itr): itr(itr) {}
constexpr auto& operator*() {
return *std::prev(itr); // <== only here we peek back
}
constexpr auto& operator++() {
--itr;
return *this;
}
friend bool operator!=(reverse_iterator<Itr> a, reverse_iterator<Itr> b) {
return a.itr != b.itr;
}
};
This is however an internal implementation detail (and can be in fact implemented in other similar manners). The user of std::reverse_iterator shall not be concerned with how it is implemented.

c++ foreach loop with pointers instead of references

I have this Container class with begin() and end() functions for use with c++11 foreach loops:
class Element
{
//content doesn't matter
};
class Container
{
Element* elements;
int size;
/* constructor, destructor, operators, methods, etc.. */
Element* begin() { return elements; };
Element* end() { return elements + size; };
};
This is now a valid c++11 foreach loop:
Container container;
for (Element& e : container)
{
//do something
}
But now consider this foreach loop:
Container container;
for (Element* e : container)
{
//do something
}
Is it possible to have a foreach loop with Element* instead of Element& like this?
This would also have the great advantage of preventing one from typing for (Element e : container) which would copy the element each time.
In that case begin() and end() would have to return Element** as far as I know.
But sizeof(Element) is not guaranteed to be sizeof(Element*) and in most cases they don't match. Incrementing a pointer increments by the base type size which is sizeof(Element) for incrementing Element* and sizeof(Element*) for incrementing Element**.
So the prefix operator++() will offset the pointer by a false value and things get crappy. Any ideas how to get this to work?
I agree with LRiO that what you have right now is probably the best solution. It additionally lines up with how the standard containers operate, and taking the path of least surprise for your users is always the best path to take (barring compelling reasons to diverge).
That said, you can certainly get the behavior you want:
class Container
{
// ...
struct iterator {
Element* e;
// this is the important one
Element* operator*() { return e; }
// the rest are just boilerplate
iterator& operator++() { ++e; return *this; }
iterator operator++(int) {
iterator tmp{e};
++*this;
return tmp;
}
bool operator==(iterator rhs) const { return e == rhs.e; }
bool operator!=(iterator rhs) const { return e != rhs.e; }
};
iterator begin() { return {elements}; };
iterator end() { return {elements + size}; };
};
You could consider inheriting from std::iterator to get the typedefs right, or using boost::iterator_facade. But this'll at least give you the functionality.
I can see why you'd want to do this, since an Element*→Element typo would be caught as an error right away, whereas an Element&→Element typo is usually a silent bug. However, in the grand scheme of things, I don't think it's worth transforming your entire container.
You could try to create a container adaptor that maintains the current behaviour of its iterators, except in the value type they expose… but I'm not even sure it's possible to do that without breaking various preconditions.
Personally, I just wouldn't.

How to implement operator-> for an iterator that constructs its values on-demand?

I have a C++ class that acts like a container: it has size() and operator[] member functions. The values stored "in" the container are std::tuple objects. However, the container doesn't actually hold the tuples in memory; instead, it constructs them on-demand based on underlying data stored in a different form.
std::tuple<int, int, int>
MyContainer::operator[](std::size_t n) const {
// Example: draw corresponding elements from parallel arrays
return { underlying_data_a[n], underlying_data_b[n], underlying_data_c[n] };
}
Hence, the return type of operator[] is a temporary object, not a reference. (This means it's not an lvalue, so the container is read-only; that's OK.)
Now I'm writing an iterator class that can be used to traverse the tuples in this container. I'd like to model RandomAccessIterator, which depends on InputIterator, but InputIterator requires support for the expression i->m (where i is an iterator instance), and as far as I can tell, an operator-> function is required to return a pointer.
Naturally, I can't return a pointer to a temporary tuple that's constructed on-demand. One possibility that comes to mind is to put a tuple instance into the iterator as a member variable, and use it to store a copy of whichever value the iterator is currently positioned on:
class Iterator {
private:
MyContainer *container;
std::size_t current_index;
// Copy of (*container)[current_index]
std::tuple<int, int, int> current_value;
// ...
};
However, updating the stored value will require the iterator to check whether its current index is less than the container's size, so that a past-the-end iterator doesn't cause undefined behavior by accessing past the end of the underlying arrays. That adds (a small amount of) runtime overhead — not enough to make the solution impractical, of course, but it feels a little inelegant. The iterator shouldn't really need to store anything but a pointer to the container it's iterating and the current position within it.
Is there a clean, well-established way to support operator-> for iterator types that construct their values on-demand? How would other developers do this sort of thing?
(Note that I don't really need to support operator-> at all — I'm implementing the iterator mainly so that the container can be traversed with a C++11 "range for" loop, and std::tuple doesn't have any members that one would typically want to access via -> anyway. But I'd like to model the iterator concepts properly nonetheless; it feels like I'm cutting corners otherwise. Or should I just not bother?)
template<class T>
struct pseudo_ptr {
T t;
T operator*()&&{return t;}
T* operator->(){ return &t; }
};
then
struct bar { int x,y; };
struct bar_iterator:std::iterator< blah, blah >{
// ...
pseudo_ptr<bar> operator->() const { return {**this}; }
// ...
};
This relies on how -> works.
ptr->b for pointer ptr is simply (*ptr).b.
Otherwise it is defined as (ptr.operator->())->b. This evaluates recursively if operator-> does not return a pointer.
The pseudo_ptr<T> above gives you a wrapper around a copy of T.
Note, however, that lifetime extension doesn't really work. The result is fragile.
Here's an example relying on the fact that operator-> is applied repeatedly until a pointer is returned. We make Iterator::operator-> return the Contained object as a temporary. This causes the compiler to reapply operator->. We then make Contained::operator-> simply return a pointer to itself. Note that if we don't want to put operator-> in the Contained on-the-fly object, we can wrap it in a helper object that returns a pointer to the internal Contained object.
#include <cstddef>
#include <iostream>
class Contained {
public:
Contained(int a_, int b_) : a(a_), b(b_) {}
const Contained *operator->() {
return this;
}
const int a, b;
};
class MyContainer {
public:
class Iterator {
friend class MyContainer;
public:
friend bool operator!=(const Iterator &it1, const Iterator &it2) {
return it1.current_index != it2.current_index;
}
private:
Iterator(const MyContainer *c, std::size_t ind) : container(c), current_index(ind) {}
public:
Iterator &operator++() {
++current_index;
return *this;
}
// -> is reapplied, since this returns a non-pointer.
Contained operator->() {
return Contained(container->underlying_data_a[current_index], container->underlying_data_b[current_index]);
}
Contained operator*() {
return Contained(container->underlying_data_a[current_index], container->underlying_data_b[current_index]);
}
private:
const MyContainer *const container;
std::size_t current_index;
};
public:
MyContainer() {
for (int i = 0; i < 10; i++) {
underlying_data_a[i] = underlying_data_b[i] = i;
}
}
Iterator begin() const {
return Iterator(this, 0);
}
Iterator end() const {
return Iterator(this, 10);
}
private:
int underlying_data_a[10];
int underlying_data_b[10];
};
int
main() {
MyContainer c;
for (const auto &e : c) {
std::cout << e.a << ", " << e.b << std::endl;
}
}

Using C-style arrays as backend for STL string operations

I'm writing a library to read some specific file formats. The file are being read with memory mapped files (boost::interprocess templates). On these files I have to do some searches with std::regex. To avoid unnecessary copying I want to use the memory mapped file directly (as C-style char array).
After some research time I come up with the following two approaches:
Using the pubsetbuf method of a streambuf object
Using the char* pointer as iterator
but since the implementation of the first one is optional for the STL vendor, I'm sticked with the second approach. Since the constructor for std::string::iterator is declared as private and the whole iterator implementation seems to be also vendor specific. I wrote my own iterator:
template<typename T>
class PointerIterator: std::iterator<std::input_iterator_tag, T> {
public:
PointerIterator(T* first, std::size_t count): first_(first), last_(first + count) {}
PointerIterator(T* first, T* last): first_(first), last_(last) {}
class iterator {
public:
iterator(T* p): ptr_(p) {}
iterator(const iterator& it): ptr_(it.ptr_) {}
iterator& operator++() {
++ptr_;
return *this;
}
iterator operator++(int) {
iterator temp(*this);
++ptr_;
return temp;
}
bool operator==(const iterator& it) { return ptr_ == it.ptr_; }
bool operator!=(const iterator& it) { return ptr_ != it.ptr_; }
T& operator*() { return *ptr_; }
private:
T* ptr_;
};
iterator begin() {
return iterator(first_);
}
iterator end() {
return iterator(last_);
}
private:
T* first_;
T* last_;
};
The iterator is working, but for use with the std::regex_search method (or other char-related STL methods) it must be of the same type as the STL iterators.
Is there some generic approach to cast my iterators to the STL ones (portable over STL implementations) or achieve the entire thng with another approach I didn't mentioned?
Edit:
The source using std::regex_search:
std::regex re(...);
boost::interprocess::mapped_region region(...);
char* first = static_cast<char*>(region.get_address());
char* last = first + 5000;
// ...
PointerIterator<char> wrapper(first, last);
std::smatch match;
while (std::regex_search(wrapper.begin(), wrapper.end(), match, re)) { // Error: No matching function call to 'regex_search'
// do something
}
Thanks
The definition of std::smatch is a specialization of std::match_results. This specialization uses string::const_iterator as the iterator type in the template arguments passed to std::match_results. This requires the begin and end arguments passed to std::regex_search to also be of type string::const_iterator.
In C++ pointers satisfy the requirements of bidirectional iterators and it is not necessary to wrap them in an iterator class. If you need to search through a buffer pointed to by a char pointer you can either use std::cmatch or use std::match_results and specify the iterator type explicitly. In the following two examples I have retained the use of PointerIterator to provide solutions that directly apply to your current code base. I have also included a stand alone example you can reference in the event you want to eliminate the use of your custom iterator class.
PointerIterator<char> wrapper(first, last);
std::cmatch match; // <<--
while (std::regex_search(wrapper.begin(), wrapper.end(), match, re))
{
// do something
}
...using std::match_results instead.
PointerIterator<char> wrapper(first, last);
std::match_results<const char*> match; // <<--
while (std::regex_search(wrapper.begin(), wrapper.end(), match, re))
{
// do something
}
Below is a stand alone example that should provide a bit of codified clarification. It is based on the example on cppreference.com and uses const char* instead of std::string as the search target.
#include <regex>
#include <iostream>
int main()
{
const char *haystack = "Roses are #ff0000";
const int size = strlen(haystack);
std::regex pattern(
"#([a-f0-9]{2})"
"([a-f0-9]{2})"
"([a-f0-9]{2})");
std::cmatch results;
std::regex_search(haystack, haystack + size, results, pattern);
for (size_t i = 0; i < results.size(); ++i) {
std::csub_match sub_match = results[i];
std::string sub_match_str = sub_match.str();
std::cout << i << ": " << sub_match_str << '\n';
}
}
This produces the following output.
0: #ff0000
1: ff
2: 00
3: 00

pointer to vector at index Vs iterator

I have a vector< Object > myvec which I use in my code to hold a list of objects in memory. I keep a pointer to the current object in that vector in the "normal" C fashion by using
Object* pObj = &myvec[index];
This all works fine if... myvec doesn't grow big enough that it is moved around during a push_back at which time pObj becomes invalid - vectors guarantee data is sequential, hence they make no effort to keep the vector at the same memory location.
I can reserve enough space for myvec to prevent this, but I dnt' like that solution.
I could keep the index of the selected myvec position and when I need to use it just access it directly, but it's a costly modification to my code.
I'm wondering if iterators keep the their references intact as a vector is reallocated/moved and if so can I just replace
Object* pObj = &myvec[index];
by something like
vector<Object>::iterator = myvec.begin()+index;
What are the implication of this?
Is this doable?
What is the standard pattern to save pointers to vector positions?
Cheers
No... using an iterator you would have the same exact problem. If a vector reallocation is performed then all iterators are invalidated and using them is Undefined Behavior.
The only solution that is reallocation-resistant with an std::vector is using the integer index.
Using for example std::list things are different, but also the are different efficiency compromises, so it really depends on what you need to do.
Another option would be to create your own "smart index" class, that stores a reference to the vector and the index. This way you could keep just passing around one "pointer" (and you could implement pointer semantic for it) but the code wouldn't suffer from reallocation risks.
Iterators are (potentially) invalidated by anything that could resize the vector (e.g., push_back).
You could, however, create your own iterator class that stored the vector and an index, which would be stable across operations that resized the vector:
#include <iterator>
#include <algorithm>
#include <iostream>
#include <vector>
namespace stable {
template <class T, class Dist=ptrdiff_t, class Ptr = T*, class Ref = T&>
class iterator : public std::iterator<std::random_access_iterator_tag, T, Dist, Ptr, Ref>
{
T &container_;
size_t index_;
public:
iterator(T &container, size_t index) : container_(container), index_(index) {}
iterator operator++() { ++index_; return *this; }
iterator operator++(int) { iterator temp(*this); ++index_; return temp; }
iterator operator--() { --index_; return *this; }
iterator operator--(int) { stable_itertor temp(*this); --index_; return temp; }
iterator operator+(Dist offset) { return iterator(container_, index_ + offset); }
iterator operator-(Dist offset) { return iterator(container_, index_ - offset); }
bool operator!=(iterator const &other) const { return index_ != other.index_; }
bool operator==(iterator const &other) const { return index_ == other.index_; }
bool operator<(iterator const &other) const { return index_ < other.index_; }
bool operator>(iterator const &other) const { return index_ > other.index_; }
typename T::value_type &operator *() { return container_[index_]; }
typename T::value_type &operator[](size_t index) { return container_[index_ + index]; }
};
template <class T>
iterator<T> begin(T &container) { return iterator<T>(container, 0); }
template <class T>
iterator<T> end(T &container) { return iterator<T>(container, container.size()); }
}
#ifdef TEST
int main() {
std::vector<int> data;
// add some data to the container:
for (int i=0; i<10; i++)
data.push_back(i);
// get iterators to the beginning/end:
stable::iterator<std::vector<int> > b = stable::begin(data);
stable::iterator<std::vector<int> > e = stable::end(data);
// add enough more data that the container will (probably) be resized:
for (int i=10; i<10000; i++)
data.push_back(i);
// Use the previously-obtained iterators:
std::copy(b, e, std::ostream_iterator<int>(std::cout, "\n"));
// These iterators also support most pointer-like operations:
std::cout << *(b+125) << "\n";
std::cout << b[150] << "\n";
return 0;
}
#endif
Since we can't embed this as a nested class inside of the container like a normal iterator class, this requires a slightly different syntax to declare/define an object of this type; instead of the usual std::vector<int>::iterator whatever;, we have to use stable::iterator<std::vector<int> > whatever;. Likewise, to obtain the beginning of a container, we use stable::begin(container).
There is one point that may be a bit surprising (at least at first): when you obtain a stable::end(container), that gets you the end of the container at that time. As shown in the test code above, if you later add more items to the container, the iterator your obtained previously is not adjusted to reflect the new end of the container -- it retains the position it had when you obtained it (i.e., the position that was the end of the container at that time, but isn't any more).
No, iterators are invalidated after vector growth.
The way to get around this problem is to keep the index to the item, not a pointer or iterator to it. This is because the item stays at its index, even if the vector grows, assuming of course that you don't insert any items before it (thus changing its index).