recently I was thinking about how it would be nice if iterators implicitly converted to bool so you could do
auto it = find(begin(x),end(x), 42);
if (it) //not it!=x.end();
{
}
but thinking about it I realized that this would mean that either it would had to be set to "NULL", so that you couldnt use the it directly if you want to do something with it (you would have to use x.end()) or you could use it but iter size would have to be bigger(to store if what it points to was .end() or not).
So my questions are:
Is the syntax in my example achievable without breaking current code and without increasing the sizeof iterator?
would implicitly conversion to bool cause some problems?
You are working on the assumption that iterators are a way of accessing a container. They allow you to do that, but they also allow many more things that would clearly not fit your intended operations:
auto it = std::find(std::begin(x), std::next(std::begin(x),10), 42 );
// Is 42 among the first 10 elements of 'x'?
auto it = std::find(std::istream_iterator<int>(std::cout),
std::istream_iterator<int>(), 42 );
// Is 42 one of the numbers from standard input?
In the first case the iterator does refer to a container, but the range where you are finding does not enclose the whole container, so it cannot be tested against end(x). In the second case there is no container at all.
Note that an efficient implementation of an iterator for many containers holds just a pointer, so any other state would increase the size of the iterator.
Regarding conversions to any-type or bool, they do cause many problems, but they can be circumvented in C++11 by means of explicit conversions, or in C++03 by using the safe-bool idiom.
You are probably more interested on a different concept: ranges. There are multiple approaches to ranges, so it is not so clear what the precise semantics should be. The first two that come to mind are Boost.Iterator and an article by Alexandrescu I read recently called On Iteration.
Two reasons this wouldn't work:
First, it's possible to use raw pointers as iterators (usually into an array):
int data[] = { 50, 42, 37, 5 };
auto it = find(begin(data), end(data), 42);
Second, you don't have to pass the actual end of the container to find; to e.g. find the first space character before a period:
auto sentence = "Hello, world.";
auto it1 = find(begin(sentence), end(sentence), '.');
auto it2 = find(begin(sentence), it1, ' ');
There's very little that can be done with a single iterator. A pair of iterators defines a sequence consisting of the elements; the first iterator points to the first element and the second iterator points one past the end of the last element. There's no way, in general, for the first iterator to know when it's been incremented to match the second iterator. Algorithms do that, because they have both iterators and can tell when the work is done. For example:
std::vector<int> vec;
vec.push_back(1);
vec.push_back(2);
vec.push_back(3);
// copy the contents of the vector:
std::copy(somewhere, vec.begin(), vec.end());
// copy the first two elements of the vector:
std::copy(somewhere, vec.begin(), vec.begin() + 2);
In both calls to copy, vec.begin() is the same iterator; the algorithm does different things because it got the second iterator that tells it when to stop.
Granted, it's possible to design a different kind of iterator that contains both the beginning and the end of the sequence (as Java does), but that's not how C++ iterators are designed. There's discussion of standardizing the notion of a "range" that holds two iterators (the new range-based for loop is a first step toward that).
Well, you probably don't want an implicit conversion, but requiring two
separate objects to determine when iteration is done is clearly a design
error. It's not so much because of the if or the for (although
using a single iterator would make these clearer as well); it's really
because it makes functional decomposition and filtering iterators
extreamly difficult, if not impossible.
Fundamentally, STL iterators are closer to smart pointers than they are
to iterators. There are times when such pointers are appropriate, but
they aren't a good replacement for iterators.
Related
I am implementing a container that presents a map-like interface. The physicals implementation is an std::vector<std::pair<K*, T>>. A K object remembers its assigned position in the vector. It is possible for a K object to get destroyed. In that case its remembered index is used to zero out its corresponding key pointer within the vector, creating a tombstone.
I would like to expose the full traditional collection of iterators, though I think that they need only claim to be forward_iterators (see next).
I want to be able to use range-based for loop iteration to return the only non-tombstoned elements. Further, I would like the implementation of my iterators to be a single pointer (i.e. no back pointer to the container).
Since the range-based for loop is pretested I think that I can implement tombstone skipping within the inequality predicate.
bool operator != (MyInterator& cursor, MyIterator stop) {
while (cursor != stop) {
if (cursor->first)
return true;
++cursor;
}
return false;
}
Is this a reasonable approach? If yes, is there a simple way for me to override the inequality operator of std::vector's iterators instead of implementing my iterators from scratch?
If this is not a reasonable approach, what would be better?
Is this a reasonable approach?
No. (Keep in mind that operator!= can be used outside a range-based for loop.)
Your operator does not accept a const object as its first parameter (meaning a const vector::iterator).
You have undefined behavior if the first parameter comes after the second (e.g. if someone tests end != cur instead of cur != end).
You get this weird case where, given iterators a and b, it might be that *a is different than *b, but if you check if (a != b) then you find that the iterators are equal and then *a is the same as *b. This probably wrecks havoc with the multipass guarantee of forward iterators (but the situation is bizarre enough that I would want to check the standard's precise wording before passing judgement). Messing with people's expectations is inadvisable.
There is no simple way to override the inequality operator of std::vector's iterators.
If this is not a reasonable approach, what would be better?
You already know what would be better. You're just shying away from it.
Implement your own iterators from scratch. Wrapping your vector in your own class has the benefit that only the code for that class has to be aware that tombstones exist.
Caveat: Document that the conditions that create a tombstone also invalidate iterators to that element. (Invalid iterators are excluded from most iterator requirements, such as the multipass guarantee.)
OR
While your implementation makes a poor operator!=, it could be a fine update or check function. There's this little-known secret that C++ has more looping structures than just range-based for loops. You could make use of one of these, for example:
for ( cur = vec.begin(); skip_tombstones(cur, vec.end()); ++cur ) {
auto& element = *cur;
where skip_tombstones() is basically your operator!= renamed. If not much code needs to iterate over the vector, this might be a reasonable option, even in the long term.
I would like to initialise a vector by transforming the other one. I made a test with two ways of inline initialisation a transformed std::vector.
One using lambda inline initialisation (with std::transform):
std::vector<int> foo(100,42);
const auto fooTimesTwo = [&]{
std::vector<int> tmp(foo.size());
std::transform(foo.begin(), foo.end(), tmp.begin(), convert);
return tmp;
}();
And the other one - using std::ranges::views::transform:
std::vector<int> foo(100,42);
auto transform_range = (foo | std::ranges::views::transform(convert));
std::vector<int> fooTimesTwo {
transform_range.begin(),
transform_range.end()
};
I expected that both way of vector initialisation should have similar performance, but from some reason the benchmark of solution with traditional std::transform is much faster that the second one (9.7 times faster -> https://quick-bench.com/q/3PSDRO9UbMNunUpdWGNShF49WlQ ).
My questions are:
Do I use std::ranges::views::transform incorrectly?
Why does it work so much slower?
Side note - it can be done using boost::make_transform_iterator, but I couldn't check it on quick-bench as they don't support boost. So I'm not sure about efficiency of such solution.
Why does it work so much slower?
The problem you're running into is one of the differences between the C++98/C++17 iterator model and the C++20 iterator model. One of the old requirements for X to be a forward iterator was:
if X is a mutable iterator, reference is a reference to T; if X is a constant iterator, reference is a reference to const T,
That is, the iterator's reference type had to be a true reference. It could not be a proxy reference or a prvalue. Any iterator whose reference is a prvalue is automatically an input iterator only.
There is no such requirement in C++20.
So if you look at foo | std::ranges::views::transform(convert), this is a range of prvalue int. In the C++20 iterator model, this is a random access range. But in the C++17, because we're dealing with prvalues, this is an input range only.
vector's iterator-pair constructor is not based on the C++20 iterator model, it is based on the C++98/C++17 iterator model. It's using the old understanding of iterator category, not the new understanding. And the C++20 range adaptors work very hard to ensure that they do the "right thing" with respect to the old iterator model. Our adapted range does correctly advertise itself as random access when checked as C++20 and input when checked as C++17:
void f(std::vector<int> v) {
auto r = v | std::views::transform(convert);
using R = decltype(r);
static_assert(std::ranges::random_access_range<R>);
static_assert(std::same_as<std::input_iterator_tag,
std::iterator_traits<std::ranges::iterator_t<R>>::iterator_category>);
}
So what happens when you pass two input iterators into vector's iterator pair constructor? Well, it can't allocate a huge chunk up front (we can't do last - first here because it's an input iterator, the notion that having a "sized sentinel" might be independent from the traversal category is also a new thing in the C++20 iterator model)... instead it basically does this:
for (; first != last; ++first) {
push_back(*first);
}
Can't do any better than that for an input iterator. But that's super inefficient, because we end up doing like eight allocations instead of one.
In range-v3, you can do this:
auto result = foo | ranges::views::transform(convert)
| ranges::to<std::vector>();
The to algorithm understands the C++20 iterator model and does the right thing by reserving in advance here. However, to is very limited in the fact that it's an external library and we can't just modify standard library types to opt into it. We hope to have a std::ranges::to in C++23 that will come with improvements to the standard library containers to do this even better. At that point, this solution will be better than your original solution since std::vector<int> foo(tmp.size()) is itself wasteful by having to zero-initialize a chunk of memory, only to immediately overwrite it.
In the meantime, I do wonder both about the general value of preserving this reference-must-be-reference requirement (that few people even know about and probably even fewer rely on: the biggest value is probably just knowing that operator->() can return &operator*()?).
std::vector<bool> already lies about this and advertises itself as being a random access C++17 range, for instance.
Standard library implementations though should be able to handle this case better. They should be able to safely check for the C++20 iterator concepts and do something intelligent as a result. In this case, we have C++20 random access iterators, so vector should be able to construct itself efficiently in this context. Submitted 100070.
Reading a C++ book I encountered the following example on using iterators:
vector<string::iterator> find_all(string& s, char c)
{
vector<string::iterator> res;
for(auto p = s.begin(); p != s.end(); ++p)
if(*p == c)
res.push_back(p);
return res;
}
void test()
{
string m {"Mary had a little lamb"};
for(auto p : find_all(m, 'a'))
if(*p != 'a')
cerr << "a bug!\n";
}
I'm a little confused about what the vector returned by find_all() contains. Is it essentially "pointers" to the elements of the string m created above it?
Thanks.
I'm a little confused about what the vector returned by find_all() contains. Is it essentially "pointers" to the elements of the string m created above it?
Mostly; iterators aren't (necessarily) pointers, they are somewhat a generalization of the pointer concept. They are used to point to specific objects stored inside containers (in this case, characters inside a string), you can use them to move between the elements of the string (via the usual arithmetic operators - when they are supported) and you "dereference" them with * to get a reference to the pointed object.
Notice that, depending from the container, they are implemented differently and provide different features; an iterator to a std::list, for example, will allow ++, -- and *, but not moving to arbitrary locations, and an iterator to a singly-linked list won't even support --, while typically iterators to array-like data structures (like vector or string) will allow completely free movement.
To refer to elements in array-like structures often one just stores indexes, since they are cheap to store and use; for other structures, instead, storing iterators may be more convenient.
For example, just yesterday I had some code which walked a unordered_set<string, int> (=a hashtable that mapped some words to their occurrences) to "take note" of some of the (string, int) couples to use them later.
The equivalent of storing vector indexes here would have been storing the hashtable's keys, but (1) they are strings (so they are moderately costly to allocate and handle), and (2) to use them to reach the corresponding object I had to do another hashtable lookup later. Instead, storing iterators in a vector guarantees no hassle for storing strings (iterators are intended to be cheap to handle) and no need to perform a lookup again.
Yes, iterators are like pointers. std::string::iterator can even be an alias for char *, although it's usually not.
In general, iterators provide a subset of pointer functionality. Which subset depends on the iterator. Your book probably covers this, but all iterators can be dereferenced (*, but there is never a reference & operation) and incremented (++), then some additionally provide --, and some add + and - on top of that.
In this case, the function seems to assume you will only be querying the values of the iterators without modifying the string. Because the allocation block used for string storage may change as the string grows, iterators (like pointers) into the string may be invalidated. This is why std::string member functions like string::find return index numbers, not iterators.
A vector of indexes could be a better design choice, but this is good enough for an example.
I'm a Java developer. I'm currently learning C++. I've been looking at code samples for sorting. In Java, one normally gives a sorting method the container it needs to sort e.g
sort(Object[] someArray)
I've noticed in C++ that you pass two args, the start and end of the container. My question is that how is the actual container accessed then?
Here's sample code taken from Wikipedia illustrating the the sort method
#include <iostream>
#include <algorithm>
#include <vector>
int main() {
std::vector<int> vec;
vec.push_back(10); vec.push_back(5); vec.push_back(100);
std::sort(vec.begin(), vec.end());
for (int i = 0; i < vec.size(); ++i)
std::cout << vec[i] << ' ';
}
vec.begin() and vec.end() are returning iterators iterators. The iterators are kind of pointers on the elements, you can read them and modify them using iterators. That is what sort is doing using the iterators.
If it is an iterator, you can directly modify the object the iterator is referring to:
*it = X;
The sort function does not have to know about the containers, which is the power of the iterators. By manipulating the pointers, it can sort the complete container without even knowing exactly what container it is.
You should learn about iterators (http://www.cprogramming.com/tutorial/stl/iterators.html)
vec.begin() and vec.end() do not return the first and last elements of the vector. They actually return what is known as an iterator. An iterator behaves very much like a pointer to the elements. If you have an iterator i that you initialised with vec.begin(), you can get a pointer to the second element in the vector just by doing i++ - the same as you would if you had a point to the first element in an array. Likewise you can do i-- to go backwards. For some iterators (known as random access iterators), you can even do i + 5 to get an iterator to the 5th element after i.
This is how the algorithm accesses the container. It knows that all of the elements that it should be sorting are between begin() and end(). It navigates around the elements by doing simple iterator operations. It can then modify the elements by doing *i, which gives the algorithm a reference to the element that i is pointing at. For example, if i is set to vec.begin(), and you do *i = 5;, you will change the value of the first element of vec.
This approach allows you to pass only part of a vector to be sorted. Let's say you only wanted to sort the first 5 elements of your vector. You could do:
std::sort(vec.begin(), vec.begin() + 5);
This is very powerful. Since iterators behave very much like pointers, you can actually pass plain old pointers too. Let's say you have an array int array[] = {4, 3, 2, 5, 1};, you could easily call std::sort(array, array + 5) (because the name of an array will decay to a pointer to its first element).
The container doesn't have to be accessed. That's the whole point of the design behind the Standard Template Library (which became part of the C++ standard library): The algorithms don't know anything about containers, just iterators.
This means they can work with anything that provides a pair of iterators. Of course all STL containers provide begin() and end() methods, but you can also use a regular old C array, or an MFC or glib container, or anything else, just by writing your own iterators for it. (And for C arrays, it's as simple as a and a+a_len for the begin and end iterators.)
As for how it works under the covers: Iterators follow an implicit protocol: you can do things like ++it to advance an iterator to the next element, or *it to get the value of the current element, or *it = 3 to set the value of the current element. (It's a bit more complicated than this, because there are a few different protocols—iterators can be random-access or forward-only, const or writable, etc. But that's the basic idea.) So, if `sort is coded to restrict itself to the iterator protocol (and, of course, it is), it works with anything that conforms to that protocol.
To learn more, there are many tutorials on the internet (and in the bookstore); there's only so much an SO answer can explain.
begin() and end() return iterators. See e.g. http://www.cprogramming.com/tutorial/stl/iterators.html
Iterators act like references into part of a container. That is, *iter = z; actually changes one of the elements in the container.
std::sort actually uses a swap function on references to the contained objects, so that any iterators you have already initialized remain in the same order but the values those iterators refer to are changed.
Note that std::list also has member functions called sort. It works the other way around: any iterators you have already initialized keep the same values, but the order of those iterators changes.
I am a little bit frustrated of how to use vectors in C++. I use them widely though I am not exactly certain of how I use them. Below are the questions?
If I have a vector lets say: std::vector<CString> v_strMyVector, with (int)v_strMyVector.size > i can I access the i member: v_strMyVector[i] == "xxxx"; ? (it works, though why?)
Do i always need to define an iterator to acces to go to the beginning of the vector, and lop on its members ?
What is the purpose of an iterator if I have access to all members of the vector directly (see 1)?
Thanks in advance,
Sun
It works only because there's no bounds checking for operator[], for performance reason. Doing so will result in undefined behavior. If you use the safer v_strMyVector.at(i), it will throw an OutOfRange exception.
It's because the operator[] returns a reference.
Since vectors can be accessed randomly in O(1) time, looping by index or iterator makes no performance difference.
The iterator lets you write an algorithm independent of the container. This iterator pattern is used a lot in the <algorithm> library to allow writing generic code easier, e.g. instead of needing N members for each of the M containers (i.e. writing M*N functions)
std::vector<T>::find(x)
std::list<T>::find(x)
std::deque<T>::find(x)
...
std::vector<T>::count(x)
std::list<T>::count(x)
std::deque<T>::count(x)
...
we just need N templates
find(iter_begin, iter_end, x);
count(iter_begin, iter_end, x);
...
and each of the M container provide the iterators, reducing the number of function needed to just M+N.
It returns a reference.
No,, because vector has random access. However, you do for other types (e.g. list, which is a doubly-linked list)
To unify all the collections (along with other types, like arrays). That way you can use algorithms like std::copy on any type that meets the requirements.
Regarding your second point, the idiomatic C++ way is not to loop at all, but to use algorithms (if feasible).
Manual looping for output:
for (std::vector<std::string>::iterator it = vec.begin(); it != end(); ++it)
{
std::cout << *it << "\n";
}
Algorithm:
std::copy(vec.begin(), vec.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
Manual looping for calling a member function:
for (std::vector<Drawable*>::iterator it = vec.begin(); it != end(); ++it)
{
(*it)->draw();
}
Algorithm:
std::for_each(vec.begin(), vec.end(), std::mem_fun(&Drawable::draw));
Hope that helps.
Workd because the [] operator is overloaded:
reference operator[](size_type n)
See http://www.sgi.com/tech/stl/Vector.html
Traversing any collection in STL using iterator is a de facto.
I think one advantage is if you replace vector by another collection, all of your code would continue to work.
That's the idea of vectors, they provide direct access to all items, much as regular arrays. Internally, vectors are represented as dynamically allocated, contiguous memory areas. The operator [] is defined to mimic semantics of the regular array.
Having an iterator is not really required, you may as well use an index variable that goes from 0 to v_strMtVector.size()-1, as you would do with regular array:
for (int i = 0; i < v_strMtVector.size(); ++i) {
...
}
That said, using an iterator is considered to be a good style by many, because...
Using an iterator makes it easier to replace underlying container type, e.g. from std::vector<> to std::list<>. Iterators may also be used with STL algorithms, such as std::sort().
std::vector is a type of sequence that provides constant time random access. You can access a reference to any item by reference in constant time but you pay for it when inserting into and deleting from the vector as these can be very expensive operations. You do not need to use iterators when accessing the contents of the vector, but it does support them.