This question already has answers here:
Incrementing iterators: Is ++it more efficient than it++? [duplicate]
(7 answers)
Closed 6 years ago.
Johannes Schaub claims here
always use the prefix increment form for iterators whose definitions
you don't know. That will ensure your code runs as generic as
possible.
for(std::vector<T>::iterator it = v.begin(); it != v.end(); ++it) {
/* std::cout << *it; ... */
}
Why doesn't this first iterate it, then start the loop (at v.begin() + 1)?
Why doesn't this first iterate it, then start the loop (at v.begin() + 1)?
The iteration statement is always executed at the end of each iteration. That is regardless of the type of increment operator you use, or whether you use an increment operator at all.
The result of the iteration statement expression is not used, so it has no effect on how the loop behaves. The statement:
++it;
Is functionally equivalent to the statement:
it++;
Postfix and prefix increment expressions have different behaviour only when the result of the expression is used.
Why use the prefix increment form for iterators?
Because the postfix operation implies a copy. Copying an iterator is generally at least as slow, but potentially slower than not copying an iterator.
A typical implementation of postfix increment:
iterator tmp(*this); // copy
++(*this); // prefix increment
return tmp; // return copy of the temporary
// (this copy can be elided by NRVO)
When the result is not used, even the first copy can be optimized away but only if the operation is expanded inline. But that is not guaranteed.
I wouldn't blindly use the rule "always use prefix increment with itrators". Some algorithms are clearer to express with postfix, although that is just my opinion. An example of an algorithm suitable for postfix increment:
template<class InIter, class OutIter>
OutIter copy(InIter first, InIter last, OutIter out) {
while(first != last)
*out++ = *first++;
return out;
}
Note that your code is equivalent to
for(std::vector<T>::iterator it = v.begin(); it != v.end(); ) {
/* std::cout << *it; ... */
++it;
}
and it should be readily apparent that it doesn't matter if you write ++it; or it++;. (This also addresses your final point.)
But conceptually it++ needs to store, in its implementation, a copy of the unincremented value, as that is what the expression evaluates to.
it might be a big heavy object of which taking a value copy is computationally expensive, and your compiler might not be able to optimise away that implicit value copy taken by it++.
These days, for most containers, a compiler will optimise the arguably clearer it++ to ++it if the value of the expression is not used; i.e. the generated code will be identical.
I follow the author's advice and always use the pre-increment whenever possible, but I am (i) old fashioned and (ii) aware that plenty of expert programmers don't, so it's largely down to personal choice.
Why doesn't this first iterate it, then start the loop (at v.begin() + 1)?
Because the for loop will be parsed as:
{
init_statement
while ( condition ) {
statement
iteration_expression ;
}
}
So
for(std::vector<T>::iterator it = v.begin(); it != v.end(); ++it) {
/* std::cout << *it; ... */
}
is equivalent to
{
std::vector<T>::iterator it = v.begin();
while ( it != v.end() ) {
/* std::cout << *it; ... */
++it ;
}
}
That means it would do the loop at v.begin() at first, then step forward it. Prefix increment means increase the value and then return the reference of the increased object; As you can seen the returned object is not used at all for this case, then ++it and it++ will lead to the same result.
Related
I'm confused as to whether I should increment an OutputIterator when I set it. See the following code in which I'm trying to split a string. My problem is with the third line of the while loop, the code seems to work fine whether I have *oit = ... or *oit++ = ...
Can someone explain to me why?
template<class O> void split(string& s, O oit){
string::iterator jt, it = s.begin();
while(1){
jt = find(it, s.end(), ' ');
if(it == s.end() && jt == s.end()) return;
*oit++ = string(it, jt);
it = jt;
if(it != s.end() ) it++;
}
}
...
int main(){
string s;
getline(cin, s);
vector<string> v;
split(s, back_inserter(v));
copy(v.begin(), v.end(), ostream_iterator<string>(cout, "\n"));
}
std::back_inserter creates an iterator that inserts by calling push_back on the underlying collection. That means the increment isn't necessary for it to work correctly.
The same is not necessarily true of other output iterators though, so for your code to be correct, it should perform the increment, event though in this particular case it's basically ignored.
Just for what it's worth, in this particular case you can get the same basic effect with a lot less/simpler code:
string s;
getline(cin, s);
replace(s, ' ', '\n');
cout << s;
The concept requires that you increment an output iterator for each write you do. Although the std::back_insert_iterator<T> may call push_back() on the corresponding object on each assignment to *it, the concept still demands that the increment operator is called. In principle, output iterators could be function calls but to fit into the same interface used also by pointers, they need to support the same operations.
The specification of std::back_insert_iterator<Cont> states that each assignment of a typename Cont::value_type calls cont.push_back(value) on the underlying type, operator*() just returns the iterator itself, as does operator++().
Both parts work because standard iterators are designed to be functionally equivalent to raw pointers when used with generic programming. When raw pointers are used, they must be incremented to reach the subsequent address.
vector<int> a;
1.
for(vector<int>::iterator it = a.begin(); it != a.end(); ++it)
2.
vector<int>::iterator end = a.end();
for(vector<int>::iterator it = a.begin(); it != end; ++it)
which is more efficient?or the same?
Initial criticisms:
1/ Typical tutorial example
for(vector<int>::iterator it = a.begin(); it != a.end(); ++it)
There is no magic, but it brings up a question: is a ever modified in the loop that the end bound may vary ?
2/ Improved
vector<int>::iterator end = a.end();
for(vector<int>::iterator it = a.begin(); it != end; ++it)
a.end() is only executed once, it seems. However since end is not const, it may be modified inside the loop.
Furthermore, it introduces the end identifier in the outer scope, polluting it.
So there is a potential gain in performance, but not much in clarity. Also, it's far more verbose.
I would propose several other ways:
3/ Best Manual
for(vector<int>::iterator it = a.begin(), end = a.end(); it != end; ++it)
Combines the advantages of v1 (quite terse, no outer scope pollution) and v2 (performance), however it is still unclear if end is ever modified within the loop body.
4/ Boost-powered
BOOST_FOREACH(int& i, a)
Even terser than v1, immediately identifiable at a glance, no outer scope leak, and guarantee of full iteration (it's not possible to modify the bounds).
Unfortunately:
there are issues with commas in the type of the variable (because it relies on the preprocessor)
compile-time errors are completely cryptic (because it relies on the preprocessor)
Note: in theory, one could make the case of the std::foreach algorithm here, but honestly... there is too much effort involved in defining a predicate outside and it breaks code locality.
5/ C++11 range-for statement
for (int& i: a)
All the advantages:
Extremely Terse
As performant as the best C++ hand-written loop
Guaranteed full iteration, no questions asked
And none of the issues (scope leak, preprocessor magic).
Personally, I use C++11 range-for whenever I can (hobby projects) and BOOST_FOREACH otherwise (at work).
I avoid like the plague modifying the container I am iterating on, preferring to rely on STL algorithms when I need to filter/remove elements... It's too easy to mess up with the boundary conditions and iterator invalidations otherwise.
2nd is more efficient as it only requires creating the end iterator once.
A smart compiler may optimize the first one to be the second, but you cannot be guaranteed that that will happen.
It would actually be a bit of a complicated optimization because the compiler would need to be 100% certain that any subsequent call to end() will have no additional effects or return anything different. Basically, it would need to know that at least over the loop, end() always returns something such that end() == previous call to end(). Whether or not compilers do that optimization is not guaranteed.
2nd way is obviously better, as it calls a.end() only once. In essence if there are N nodes in your tree then you save N calls to a.end().
I think that the first for loop is more certain. In case you insert/erase elements inside this for loop the end iterator you have defined is invalidated. For example:
vector<int>::iterator mend = int_vec.end(), mbegin = int_vec.begin();
while(mbegin != mend)
{
cout << *mbegin << " ";
int_vec.erase(mbegin);
// mbegin is automatically invalidated
// execution of this program causes bizarre runtime_error !
// never try this at home !
}
A safer version of the code above could be this:
vector<int>::iterator mend = int_vec.end(), mbegin = int_vec.begin();
while(mbegin != mend)
{
cout << *mbegin << " ";
int_vec.erase(mbegin);
mbegin = int_vec.begin(); // ok, mbegin updated.
}
in c++ 11 if we have a set<int> S; we could say:
for (auto i: S)
cout << i << endl;
but can we force i to be a iterator, I mean write a code that is equivalent to:
for (auto i = S.begin(); i != S.end(); i++)
cout << (i != s.begin()) ? " " : "" << *i;
or could we do something that we can understand the index of i in the set(or vector)?
and another question is how could we say that don't do this for all elements in S but for first half of them or all of them except the first one.
or when we have a vector<int> V, and want to print its first n values what should we do? I know we can create a new vector but it takes time to copy a vector to a new vector.
No, unluckily. See what the standard says:
The range-based for statement
for ( for-range-declaration : expression ) statement
is equivalent to
{
auto && __range = ( expression );
for ( auto __begin = begin-expr, __end = end-expr; __begin != __end; ++__begin ) {
for-range-declaration = *__begin;
statement
}
}
where __range, __begin, and __end are variables defined for exposition only
In other words, it already iterates from begin to end and already dereferences the iterator, which you never get to see.
The principle of the range-based for is to iterate over the whole range.
However you decide what the range is, therefore you can operate on the range itself.
template <typename It>
class RangeView {
public:
typedef It iterator;
RangeView(): _begin(), _end() {}
RangeView(iterator begin, iterator end): _begin(begin), _end(end) {}
iterator begin() const { return _begin; }
iterator end() const { return _end; }
private:
iterator _begin;
iterator _end;
};
template <typename C>
RangeView<typename C::iterator> rangeView(C& c, size_t begin, size_t end) {
return RangeView<typename C::iterator>(
std::next(c.begin(), begin),
std::next(c.begin(), end)
);
}
template <typename C>
RangeView<typename C::const_iterator> rangeView(C const& c, size_t begin, size_t end) {
return RangeView<typename C::const_iterator>(
std::next(c.begin(), begin),
std::next(c.begin(), end)
);
}
Okay, this seriously ressemble Boost.Range...
And now, let's use it!
for (auto i: rangeView(set, 1, 10)) {
// iterate through the second to the ninth element
}
No, you can't.
for (... : ...)
is called for instead of foreach only for the reason of not introducing a new keyword. The whole point of foreach is a quick short syntax for iterating all elements without caring for their index. For all other situations there's simple for which serves its purpose quite effectively.
For the general case, you'd have to use a seperate variable:
int i = 0;
for (auto x : s)
cout << (i++ ? " " : "") << x << endl;
There are, of course, tricks for certain containers like vector, but none work for every container.
You would probably be better off using the plain for loop for this purpose.
You can't in a set. Use the traditional for syntax or maintain your own index counter.
You can in a vector or other container with a flat layout like std::array or a C-style array. Change it to use a reference.:
for (auto &i: S)
Then you can compare the address of i with the address of s[0] to get the index.
Range-based for is intended for simple cases. I'd expect to to mildly useful while protoyping something but would expect uses of it mostly gone long before things actually become a product. It may possibly useful to make life for beginners easier, but this is an area I can't judge (but what seems to drive a lot of the recent C++ discussions).
The only somewhat constructive approach could be to use an adapter which references the underlying range and whose begin() and end() methods adjust the iterator appropriately. Also note that you probably want to hoist any special handling of the first or last element out of the loop processing the bulk of the data. Sure, it is only another check followed by a correctly predicted branch vs. no check and less pollution of the branch prediction tables.
Examples showing how to iterate over a std::map are often like that:
MapType::const_iterator end = data.end();
for (MapType::const_iterator it = data.begin(); it != end; ++it)
i.e. it uses ++it instead of it++. Is there any reason why? Could there be any problem if I use it++ instead?
it++ returns a copy of the previous iterator. Since this iterator is not used, this is wasteful. ++it returns a reference to the incremented iterator, avoiding the copy.
Please see Question 13.15 for a fuller explanation.
Putting it to the test, I made three source files:
#include <map>
struct Foo { int a; double b; char c; };
typedef std::map<int, Foo> FMap;
### File 1 only ###
void Set(FMap & m, const Foo & f)
{
for (FMap::iterator it = m.begin(), end = m.end(); it != end; ++it)
it->second = f;
}
### File 2 only ###
void Set(FMap & m, const Foo & f)
{
for (FMap::iterator it = m.begin(); it != m.end(); ++it)
it->second = f;
}
### File 3 only ###
void Set(FMap & m, const Foo & f)
{
for (FMap::iterator it = m.begin(); it != m.end(); it++)
it->second = f;
}
### end ###
After compiling with g++ -S -O3, GCC 4.6.1, I find that version 2 and 3 produce identical assembly, and version 1 differs only in one instruction, cmpl %eax, %esi vs cmpl %esi, %eax.
So, take your pick and use whatever suits your style. Prefix increment ++it is probably best because it expresses your requirements most accurately, but don't get hung up about it.
There’s a slight performance advantage in using pre-increment operators versus post-increment operators. In setting up loops that use iterators, you should opt to use pre-increments:
for (list<string>::const_iterator it = tokens.begin();
it != tokens.end();
++it) { // Don't use it++
...
}
The reason comes to light when you think about how both operators would typically be implemented.The pre-increment is quite straightforward. However, in order for post-increment to work, you need to first make a copy of the object, do the actual increment on the original object and then return the copy:
class MyInteger {
private:
int m_nValue;
public:
MyInteger(int i) {
m_nValue = i;
}
// Pre-increment
const MyInteger &operator++() {
++m_nValue;
return *this;
}
// Post-increment
MyInteger operator++(int) {
MyInteger clone = *this; // Copy operation 1
++m_nValue;
return clone; // Copy operation 2
}
}
As you can see, the post-increment implementation involves two extra copy operations. This can be quite expensive if the object in question is bulky. Having said that, some compilers may be smart enough to get away with a single copy operation, through optimization. The point is that a post-increment will typically involve more work than a pre-increment and therefore it’s wise to get used to putting your “++”s before your iterators rather than after.
(1) Credit to linked website.
At logical point of view - it's the same and it doesn't matter here.
Why the prefix one is used - because it's faster - it changes the iterator and returns its value, while the postfix creates temp object, increments the current iterator and then returns the temp object (copy of the same iterator, before the incrementing ). As no one watches this temp object here (the return value), it's the same (logically).
There's a pretty big chance, that the compiler will optimize this.
In Addition - actually, this is supposed to be like this for any types at all. But it's just supposed to be. As anyone can overload operator++ - the postfix and prefix, they can have side effects and different behavior.
Well, this is a horrible thing to do, but still possible.
It won't cause any problems, but using ++it is more correct. With small types it doesn't really matter to use ++i or i++, but for "big" classes:
operator++(type x,int){
type tmp=x; //need copy
++x;
return tmp;
}
The compiler may optimize out some of them, but it's hard to be sure.
As other answers have said, prefer ++it unless it won't work in context. For iterating over containers of small types it really makes little difference (or no difference if the compiler optimizes it away), but for containers of large types it can make a difference due to saving the cost of making a copy.
True, you might know in your specific context that the type is small enough so you don't worry about it. But later, someone else on your team might change the contents of the container to where it would matter. Plus, I think it is better to get yourself into a good habit, and only post-increment when you know you must.
I'm working with iterators on C++ and I'm having some trouble here. It says "Debug Assertion Failed" on expression (this->_Has_container()) on line interIterator++.
Distance list is a vector< vector< DistanceNode > >. What I'm I doing wrong?
vector< vector<DistanceNode> >::iterator externIterator = distanceList.begin();
while (externIterator != distanceList.end()) {
vector<DistanceNode>::iterator interIterator = externIterator->begin();
while (interIterator != externIterator->end()){
if (interIterator->getReference() == tmp){
//remove element pointed by interIterator
externIterator->erase(interIterator);
} // if
interIterator++;
} // while
externIterator++;
} // while
vector's erase() returns a new iterator to the next element. All iterators to the erased element and to elements after it become invalidated. Your loop ignores this, however, and continues to use interIterator.
Your code should look something like this:
if (condition)
interIterator = externIterator->erase(interIterator);
else
++interIterator; // (generally better practice to use pre-increment)
You can't remove elements from a sequence container while iterating over it — at least not the way you are doing it — because calling erase invalidates the iterator. You should assign the return value from erase to the iterator and suppress the increment:
while (interIterator != externIterator->end()){
if (interIterator->getReference() == tmp){
interIterator = externIterator->erase(interIterator);
} else {
++interIterator;
}
}
Also, never use post-increment (i++) when pre-increment (++i) will do.
I'll take the liberty to rewrite the code:
class ByReference: public std::unary_function<bool, DistanceNode>
{
public:
explicit ByReference(const Reference& r): mReference(r) {}
bool operator()(const DistanceNode& node) const
{
return node.getReference() == r;
}
private:
Reference mReference;
};
typedef std::vector< std::vector< DistanceNode > >::iterator iterator_t;
for (iterator_t it = dl.begin(), end = dl.end(); it != end; ++it)
{
it->erase(
std::remove_if(it->begin(), it->end(), ByReference(tmp)),
it->end()
);
}
Why ?
The first loop (externIterator) iterates over a full range of elements without ever modifying the range itself, it's what a for is for, this way you won't forget to increment (admittedly a for_each would be better, but the syntax can be awkward)
The second loop is tricky: simply speaking you're actually cutting the branch you're sitting on when you call erase, which requires jumping around (using the value returned). In this case the operation you want to accomplish (purging the list according to a certain criteria) is exactly what the remove-erase idiom is tailored for.
Note that the code could be tidied up if we had true lambda support at our disposal. In C++0x we would write:
std::for_each(distanceList.begin(), distanceList.end(),
[const& tmp](std::vector<DistanceNode>& vec)
{
vec.erase(
std::remove_if(vec.begin(), vec.end(),
[const& tmp](const DistanceNode& dn) { return dn.getReference() == tmp; }
),
vec.end()
);
}
);
As you can see, we don't see any iterator incrementing / dereferencing taking place any longer, it's all wrapped in dedicated algorithms which ensure that everything is handled appropriately.
I'll grant you the syntax looks strange, but I guess it's because we are not used to it yet.