The following code works as expected (the test passes) but I wonder if working with iterators in this way is considered a bad practice in c++ or if it is okay.
Maybe this is specific for std::vector and other collections behave differently and best practices vary between collections (or even their implementations)?
It certainly is not okay in other languages and most of the time changing a collection will invalidate iterators and throw exceptions.
BOOST_AUTO_TEST_CASE (ReverseIteratorExample) {
std::vector<int> myvector;
for(int i = 0; i < 5; i++)
{
myvector.push_back(i);
}
// is this generally a bad idea to change the vector while iterating?
// is it okay in this specific case?
myvector.reserve(myvector.size() + myvector.size() - 2 );
myvector.insert(myvector.end(), myvector.rbegin() + 1, myvector.rend() -1);
int resultset [8] = { 0,1,2,3,4,3,2,1 };
std::vector<int> resultVector( resultset, resultset + sizeof(resultset)/sizeof(resultset[0]) );
BOOST_CHECK_EQUAL_COLLECTIONS(myvector.begin(), myvector.end(), resultVector.begin(), resultVector.end());
}
Summarized Questions:
Is this generally a bad idea to change the vector while iterating?
Is it okay in this specific case?
Is this specific for std::vector and other collections behave differently?
Do best practices vary between collections (or even their implementations)?
This is not valid code. The standard's definition of operations on sequence containers states (23.2.3#4):
a.insert(p,i,j) - [...] pre: i and j are not iterators into a.
So your code invokes undefined behavior because it violates the precondition for the insert operation.
If instead of using insert, you wrote a loop iterating from myvector.rbegin() + 1 to myvector.rend() -1 and called push_back on all values, your code would be valid: This is because push_back only invalidates vector iterators if a reallocation is needed, and your call to reserve ensures that this is not the case.
In general, while there are some cases where modifying a container while iterating over it is fine (such as the loop described above), you have to make sure that your iterators aren't invalidated while doing so. When this happens is specific to each container.
Related
It's very basic, but I could not find a similar question here. I am trying to iterate the same sorted STL list from different directions using list. I know I can compare an iterator to the list.begin() and list.end(), so why doesn't this work?
list<family>::iterator itLargeFamily =
families.begin(); //starts from the biggest families
list<family>::iterator itSmallFamily =
families.end(); //starts from the smallest families
for (; itSmallFamily > itLargeFamily; --itSmallFamily, ++itLargeFamily) {
// stuff...
}
The error is of course
no operator > matches these operands
100% chance I'm missing something basic.
Only random access iterators are ordered. std::list iterators are only bidirectional iterators, so they do not support operator< or operator>.
Instead, you could do your comparison with !=.
while (itSmallFamily != itLargeFamily)
You'll have to make sure that the iterators don't jump over each other for this to work though. That is, if itSmallFamily is only one increment away from itLargeFamily, you will simply swap them over and they'll never have been equal to each other.
You could instead use std::vector, whose iterators are random access iterators. In addition, std::array and std::deque are also support random access.
From the comments and the answer of sftrabbit you can see that relational operators are only defined for random access iterators, and std::list has only bidirectional iterators. So there are several solutions for your problem:
Use std::vector or std::array. They provide random access iterators, have better performance for smaller sizes, and depending of how you fill/use them for larger sizes as well, and they have better memory footprint. This is the preferred solution, I'd call it the "default" solution. Use other containers only if there is a very good, measurable reason (e.g. a profiler tells you that using that container is a performance bottleneck).
Since you know the size of the list, you can use a counter for your iterations:
for (size_t i = 0, count = families.size()/2;
i < count;
++i, --itSmallFamily, ++itLargeFamily)
{ /* do stuff */ }
Since your list is sorted, you can compare the elements the iterators point to instead of the iterators themselves.
It's very basic, but I could not find a similar question here. I am trying to iterate the same sorted STL list from different directions using list. I know I can compare an iterator to the list.begin() and list.end(), so why doesn't this work?
list<family>::iterator itLargeFamily =
families.begin(); //starts from the biggest families
list<family>::iterator itSmallFamily =
families.end(); //starts from the smallest families
for (; itSmallFamily > itLargeFamily; --itSmallFamily, ++itLargeFamily) {
// stuff...
}
The error is of course
no operator > matches these operands
100% chance I'm missing something basic.
Only random access iterators are ordered. std::list iterators are only bidirectional iterators, so they do not support operator< or operator>.
Instead, you could do your comparison with !=.
while (itSmallFamily != itLargeFamily)
You'll have to make sure that the iterators don't jump over each other for this to work though. That is, if itSmallFamily is only one increment away from itLargeFamily, you will simply swap them over and they'll never have been equal to each other.
You could instead use std::vector, whose iterators are random access iterators. In addition, std::array and std::deque are also support random access.
From the comments and the answer of sftrabbit you can see that relational operators are only defined for random access iterators, and std::list has only bidirectional iterators. So there are several solutions for your problem:
Use std::vector or std::array. They provide random access iterators, have better performance for smaller sizes, and depending of how you fill/use them for larger sizes as well, and they have better memory footprint. This is the preferred solution, I'd call it the "default" solution. Use other containers only if there is a very good, measurable reason (e.g. a profiler tells you that using that container is a performance bottleneck).
Since you know the size of the list, you can use a counter for your iterations:
for (size_t i = 0, count = families.size()/2;
i < count;
++i, --itSmallFamily, ++itLargeFamily)
{ /* do stuff */ }
Since your list is sorted, you can compare the elements the iterators point to instead of the iterators themselves.
I've implemented a merge function for vectors, which basically combines to sorted vectors in a one sorted vector. (yes, it is for a merge sort algorithm). I was trying to make my code faster and avoid overheads, so I decided not to use the push_back method on the vector, but try to use the array syntax instead which has lesser over head. However, something is going terribly wrong, and the output is messed up when i do this. Here's the code:
while(size1<left.size() && size2 < right.size()) //left and right are the input vectors
{
//it1 and it2 are iterators on the two sorted input vectors
if(*it1 <= *it2)
{
final.push_back(*it1); //final is the final vector to output
//final[count] = *it1; // this does not work for some reason
it1++;
size1++;
//cout<<"count ="<<count<<" size1 ="<<size1<<endl;
}
else
{
final.push_back(*it2);
//final[count] = left[size2];
it2++;
size2++;
}
count++;
//cout<<"count ="<<count<<" size1 ="<<size1<<"size2 = "<<size2<<endl;
}
It seems to me that the two methods should be functionally equivalent.
PS I have already reserved space for the final vector so that shouldnt be a problem.
You can't add new objects to vector using operator[]. .reserve() doesn't add them neither. You have to either use .resize() or .push_back().
Also, you are not avoiding overheads at all; call cost of operator[] isn't really much better that push_back() one, so until you profile your code thorougly, just use push_back. You can still use reserve to make sure unneccessary allocations won't be made.
In most of the cases, "optimizations" like this don't really help. If you want to make your code faster, profile it first and look for the hot paths.
There is a huge difference between
vector[i] = item;
and
vector.push_back(item);
Differences:
The first one modifies the element at index i and i must be valid index. That is,
0 <= i < vector.size() must be true
If i is an invalid index, the first one invokes undefined behavior, which means ANYTHING can happen. You could, however, use at() which throws exception if i is invalid:
vector.at(i) = item; //throws exception if i is invalid
The second one adds an element to the vector at the end, which means the size of the vector increases by one.
Since, sematically both of them do different thing, choose the one which you need.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why use iterators instead of array indices?
I'm reviewing my knowledge on C++ and I've stumbled upon iterators. One thing I want to know is what makes them so special and I want to know why this:
using namespace std;
vector<int> myIntVector;
vector<int>::iterator myIntVectorIterator;
// Add some elements to myIntVector
myIntVector.push_back(1);
myIntVector.push_back(4);
myIntVector.push_back(8);
for(myIntVectorIterator = myIntVector.begin();
myIntVectorIterator != myIntVector.end();
myIntVectorIterator++)
{
cout<<*myIntVectorIterator<<" ";
//Should output 1 4 8
}
is better than this:
using namespace std;
vector<int> myIntVector;
// Add some elements to myIntVector
myIntVector.push_back(1);
myIntVector.push_back(4);
myIntVector.push_back(8);
for(int y=0; y<myIntVector.size(); y++)
{
cout<<myIntVector[y]<<" ";
//Should output 1 4 8
}
And yes I know that I shouldn't be using the std namespace. I just took this example off of the cprogramming website. So can you please tell me why the latter is worse? What's the big difference?
The special thing about iterators is that they provide the glue between algorithms and containers. For generic code, the recommendation would be to use a combination of STL algorithms (e.g. find, sort, remove, copy) etc. that carries out the computation that you have in mind on your data structure (vector, list, map etc.), and to supply that algorithm with iterators into your container.
Your particular example could be written as a combination of the for_each algorithm and the vector container (see option 3) below), but it's only one out of four distinct ways to iterate over a std::vector:
1) index-based iteration
for (std::size_t i = 0; i != v.size(); ++i) {
// access element as v[i]
// any code including continue, break, return
}
Advantages: familiar to anyone familiar with C-style code, can loop using different strides (e.g. i += 2).
Disadvantages: only for sequential random access containers (vector, array, deque), doesn't work for list, forward_list or the associative containers. Also the loop control is a little verbose (init, check, increment). People need to be aware of the 0-based indexing in C++.
2) iterator-based iteration
for (auto it = v.begin(); it != v.end(); ++it) {
// if the current index is needed:
auto i = std::distance(v.begin(), it);
// access element as *it
// any code including continue, break, return
}
Advantages: more generic, works for all containers (even the new unordered associative containers, can also use different strides (e.g. std::advance(it, 2));
Disadvantages: need extra work to get the index of the current element (could be O(N) for list or forward_list). Again, the loop control is a little verbose (init, check, increment).
3) STL for_each algorithm + lambda
std::for_each(v.begin(), v.end(), [](T const& elem) {
// if the current index is needed:
auto i = &elem - &v[0];
// cannot continue, break or return out of the loop
});
Advantages: same as 2) plus small reduction in loop control (no check and increment), this can greatly reduce your bug rate (wrong init, check or increment, off-by-one errors).
Disadvantages: same as explicit iterator-loop plus restricted possibilities for flow control in the loop (cannot use continue, break or return) and no option for different strides (unless you use an iterator adapter that overloads operator++).
4) range-for loop
for (auto& elem: v) {
// if the current index is needed:
auto i = &elem - &v[0];
// any code including continue, break, return
}
Advantages: very compact loop control, direct access to the current element.
Disadvantages: extra statement to get the index. Cannot use different strides.
What to use?
For your particular example of iterating over std::vector: if you really need the index (e.g. access the previous or next element, printing/logging the index inside the loop etc.) or you need a stride different than 1, then I would go for the explicitly indexed-loop, otherwise I'd go for the range-for loop.
For generic algorithms on generic containers I'd go for the explicit iterator loop unless the code contained no flow control inside the loop and needed stride 1, in which case I'd go for the STL for_each + a lambda.
With a vector iterators do no offer any real advantage. The syntax is uglier, longer to type and harder to read.
Iterating over a vector using iterators is not faster and is not safer (actually if the vector is possibly resized during the iteration using iterators will put you in big troubles).
The idea of having a generic loop that works when you will change later the container type is also mostly nonsense in real cases. Unfortunately the dark side of a strictly typed language without serious typing inference (a bit better now with C++11, however) is that you need to say what is the type of everything at each step. If you change your mind later you will still need to go around and change everything. Moreover different containers have very different trade-offs and changing container type is not something that happens that often.
The only case in which iteration should be kept if possible generic is when writing template code, but that (I hope for you) is not the most frequent case.
The only problem present in your explicit index loop is that size returns an unsigned value (a design bug of C++) and comparison between signed and unsigned is dangerous and surprising, so better avoided. If you use a decent compiler with warnings enabled there should be a diagnostic on that.
Note that the solution is not to use an unsiged as the index, because arithmetic between unsigned values is also apparently illogical (it's modulo arithmetic, and x-1 may be bigger than x). You instead should cast the size to an integer before using it.
It may make some sense to use unsigned sizes and indexes (paying a LOT of attention to every expression you write) only if you're working on a 16 bit C++ implementation (16 bit was the reason for having unsigned values in sizes).
As a typical mistake that unsigned size may introduce consider:
void drawPolyline(const std::vector<P2d>& points)
{
for (int i=0; i<points.size()-1; i++)
drawLine(points[i], points[i+1]);
}
Here the bug is present because if you pass an empty points vector the value points.size()-1 will be a huge positive number, making you looping into a segfault.
A working solution could be
for (int i=1; i<points.size(); i++)
drawLine(points[i - 1], points[i]);
but I personally prefer to always remove unsinged-ness with int(v.size()).
PS: If you really don't want to think by to yourself to the implications and simply want an expert to tell you then consider that a quite a few world recognized C++ experts agree and expressed opinions on that unsigned values are a bad idea except for bit manipulations.
Discovering the ugliness of using iterators in the case of iterating up to second-last is left as an exercise for the reader.
Iterators make your code more generic.
Every standard library container provides an iterator hence if you change your container class in future the loop wont be affected.
Iterators are first choice over operator[]. C++11 provides std::begin(), std::end() functions.
As your code uses just std::vector, I can't say there is much difference in both codes, however, operator [] may not operate as you intend to. For example if you use map, operator[] will insert an element if not found.
Also, by using iterator your code becomes more portable between containers. You can switch containers from std::vector to std::list or other container freely without changing much if you use iterator such rule doesn't apply to operator[].
It always depends on what you need.
You should use operator[] when you need direct access to elements in the vector (when you need to index a specific element in the vector). There is nothing wrong in using it over iterators. However, you must decide for yourself which (operator[] or iterators) suits best your needs.
Using iterators would enable you to switch to other container types without much change in your code. In other words, using iterators would make your code more generic, and does not depend on a particular type of container.
By writing your client code in terms of iterators you abstract away the container completely.
Consider this code:
class ExpressionParser // some generic arbitrary expression parser
{
public:
template<typename It>
void parse(It begin, const It end)
{
using namespace std;
using namespace std::placeholders;
for_each(begin, end,
bind(&ExpressionParser::process_next, this, _1);
}
// process next char in a stream (defined elsewhere)
void process_next(char c);
};
client code:
ExpressionParser p;
std::string expression("SUM(A) FOR A in [1, 2, 3, 4]");
p.parse(expression.begin(), expression.end());
std::istringstream file("expression.txt");
p.parse(std::istringstream<char>(file), std::istringstream<char>());
char expr[] = "[12a^2 + 13a - 5] with a=108";
p.parse(std::begin(expr), std::end(expr));
Edit: Consider your original code example, implemented with :
using namespace std;
vector<int> myIntVector;
// Add some elements to myIntVector
myIntVector.push_back(1);
myIntVector.push_back(4);
myIntVector.push_back(8);
copy(myIntVector.begin(), myIntVector.end(),
std::ostream_iterator<int>(cout, " "));
The nice thing about iterator is that later on if you wanted to switch your vector to a another STD container. Then the forloop will still work.
its a matter of speed. using the iterator accesses the elements faster. a similar question was answered here:
What's faster, iterating an STL vector with vector::iterator or with at()?
Edit:
speed of access varies with each cpu and compiler
How come that random deletion from a std::vector is faster than a std::list? What I'm doing to speed it up is swapping the random element with the last and then deleting the last.
I would have thought that the list would be faster since random deletion is what it was built for.
for(int i = 500; i < 600; i++){
swap(vector1[i], vector1[vector1.size()-1]);
vector1.pop_back();
}
for(int i = 0; i < 100; i++){
list1.pop_front();
}
Results (in seconds):
Vec swap delete: 0.00000909461232367903
List normal delete: 0.00011785102105932310
What you're doing is not random deletion though. You're deleting from the end, which is what vectors are built for (among other things).
And when swapping, you're doing a single random indexing operation, which is also what vectors are good at.
The difference between std::list and std::vector is not merely down to performance. They also have different iterator invalidation semantics. If you erase an item from a std::list, all iterators pointing to other items in the list remain valid. Not so with std::vector, where erasing an item invalidates all iterators pointing after that item. (In some implementations, they may still serve as valid iterators, but according to the standard they are now unusable, and a checking implementation ought to assert if you try to use them.)
So your choice of container is also to do with what semantics you require.
That's not random. Try vector1.erase(vector.begin() + rand() % vector.size()); instead.
The list erase will cause to delete the erased list element, this will invoke a call to the delete operator. The vector erase just causes a swap and then a integer decrement - this is a lot faster.
Actually if you want further speed-ups you should index elements in the vector via iterators. They are known to have better performance for some architectures.