Confused with C++ iterator - c++

I have 3 scenarios with c++ iterator which all together confused me.
Here's my main code:
int arr[] = {13,20,40};
set<int> st(arr,arr+3);
auto it=st.begin();
auto tmp=it;
it++;
st.erase(it);
//scenarios here
1- if I write the following, result is 20, why?
cout << *tmp << endl;
2- If I move pointer forward, result is 13, why?
tmp++;
cout << *tmp << endl;
3- If I move pointer backward, result is also 13, why it's same as moving forward?
tmp--;
cout << *tmp << endl;
4- Finally, if instead of erasing something from middle of set, I erase the start item, the result is a random number.
auto it=st.begin();
auto tmp = it;
st.erase(it);
cout << *tmp << endl;//result: 13
tmp++;
cout << *tmp << endl;//result: random number
If you know any useful link about iterators in c++ related to this issue, please mention them.

If you delete the entry in a set that an iterator points to, you have invalidated the iterator. Once an iterator has been invalidated, you should no longer user it, as it causes undefined behavior.

First you don't mention your compiler. In a question like "how do I get C++ to do X,Y and Z" that is OK, but in a question like this where you ask why your program does not do what you expect you should specify compiler ( and version! ).
The reason is simple. First there may be a compiler bug which may cause certain code to not do what it is supposed to do. Second there are loose areas of the standard. For example in this case, there is no mention of how set is implemented, just how the interface is supposed to behave. So set can be implemented as a tree, an array, a hash table, whatever. Those behaviours change how set acts especially in "undefined" cases.
Second when you show multiple test cases, like here. it would be useful to separate the test cases with macros.
#ifdef TEST1
cout << *tmp << endl;
#endif
#ifdef TEST2
tmp++;
cout << *tmp << endl;
#endif
#ifdef TEST3
tmp--;
cout << *tmp << endl;
#endif
Generally you should avoid macros, but conditional compilation is one of the few exceptions to that rule.
So now on to your question.
First. I tend to use mostly vector, I tend to thing of iterators as C style pointers and ++ and -- as pointer arithmetic. It's hard to see how that is not the case for vector. but it is not always the case. It kind of messes with my instincts. Nevertheless it is a good picture to have in your head.
Just keep in mind that It's only mostly accurate. Depending on how set is implemented iterators will be different. In the case of a tree the iterator will be a pointer and ++ will be this->next and -- will be this->last.
Second. There are two types of collections. Collections ( which I will call basic collections ) which are based on "primitive" datatypes, and virtual collections which are things like directory listings or SQL Result Sets. In the first case, there is almost always a pointer hidden somewhere in the bowels of the iterator ( or not so deep ). Even in the second case, there is some sort of pointer like object ( a file descriptor/handle, a SQL Cursor ). So despite the fact that an iterator is put in an undefined state, it will still point to something. Though that thing might be invalid. It's like a character pointer which points to the wrong spot. There are still characters there, they just might be junk.
Third. To understand why iterators are made inconsistent when a collection changes, think about what you would have to do to create an iterator that remains consistent after a container change? First you would need an observer pattern implemented in the container and the iterator to let the iterator know when the container gets changed. An iterator is supposed to be a light weight object. This alone would cause at a doubling of iterator size. The message passing would add overhead when iterators are created.
The standards committee has a rule: any feature that you don't use should not have overhead when not used. That would mean you would need separate classes of at least a way to distinguish whether a container allows iterators that are consistent when the containers change. That would mean an extra level of complexity in the STL.
There is also the problem of how to fix an inconsistent iterator. Assume that the element pointed to by the iterator is deleted. What do you do? Go back one? Go forward one? What if the collection is unordered? Then changing the collection can cause the iteration order to be changed. How do you make sure that the iteratior hits all nodes?
Fourth. When doing something like this ( a teaching program ), it's a good idea to step through the code with a debugger to get an idea of what's going on inside.
Sorry it took so long. But the anwer was more complicated then I thought. That is not new for C++. Welcome to it.

Related

How (is it possible) to access class members using iterators in C++?

Before, when using normal for-loop I would access it like this:
this->customClassCustomVectorArray[i]->getCustomMember();
But now I dont know how to access it because when I type "->" VS2010 doesnt offer any members or methods.
for(vector<CustomClass*>::iterator it = this->customClassCustomVectorArray.begin(); it != this->customClassCustomVectorArray.end(); ++it){
cout << this->customClassCustomVectorArray[*it]->getCustomMember() << endl;
}
Ive tried "*it" and "it" but nothing happens. Is this even possible? I suppose it should be.
cout << this->customClassCustomVectorArray[*it]->getCustomMember() << endl;
That is not right. [] expects a numeric index into the vector, but *it is not a numeric index. It is an iterator. It already points to the correct position in the vector.
cout << (*it)->getCustomMember() << endl;
[] is only for when you are iterating using numeric indices, not iterators.
Wrong:
this->customClassCustomVectorArray[*it]->getCustomMember()
Right:
(*it)->getCustomMember()
Why? Because it is very much like
this->customClassCustomVectorArray + i
In fact, it's exactly the same as
this->customClassCustomVectorArray.begin() + i
so you don't need to say
this->customClassCustomVectorArray
twice.
While #Lightness's answer is correct as far as it goes, it still doesn't point toward how you should be doing this. Most times that you write an explicit loop using iterators, it's a mistake (and this is no exception). Most uses of std::endl are also mistakes (and this doesn't look like an exception in that respect either).
So, instead of writing an explicit loop, then explicitly dereferencing the iterator, etc., and finally taking his advice (correct though it is) to get the syntax correct, you should be thinking in terms of getting rid of essentially all of that, and using generic algorithms to do the job instead.
In this case, you have an input collection. For each item in that input collection, you're invoking a function, then writing the result to some output collection. That pretty much describes std::transform, so it's probably a decent fit for the job at hand.
std::transform(customClassCustomVectorArray.begin(),
customClassCustomVectorArray.end(),
mem_fun(&customClass::customMember),
std::ostream_iterator<result_type>(std::cout, "\n"));
This uses result_type to represent the type of the result from invoking your member function. Having to explicitly specify that result type is something of a liability of this technique, but at least IMO, its strengths substantially outweigh that minor liability.
Probably the most obvious strength is that the definition of std::transform is (reasonably) well known and standardized, so anybody who knows C++ (at all well) immediately has a pretty fair idea of what this is supposed to do as a whole--invoke some function on every item in a range, and write the returned value to some destination.

Is stability of std::remove and std::remove_if design fail?

Recently (from one SO comment) I learned that std::remove and std:remove_if are stable. Am I wrong to think this is a terrible design choice since it prevents certain optimizations?
Imagine removing the first and fifth elements of a 1M std::vector. Because of stability, we can't implement remove with swap. Instead we must shift every remaining element. :(
If we weren't limited by stability we could (for RA and BD iter) practically have 2 iters, one from front, second from behind, and then use swap to bring to-be-removed items to end. I'm sure smart people could maybe do even better. My question is in general, not about specific optimization I'm talking about.
EDIT: please note that C++ advertizes the zero overhead principle, and also there are std::sort and std::stable_sort sort algorithms.
EDIT2:
optimization would be something like the following:
For remove_if:
bad_iter looks from the beginning for those elements for which the predicate returns true.
good_iter looks from the end for those elements for which the predicate returns false.
when both have found what is expected they swap their elements. Termination is at good_iter <= bad_iter.
If it helps, think of it like one iter in quick sort algorithm, but we don't compare them to a special element, but instead we use the above predicate.
EDIT3: I played around and tried to find worst case (worst case for remove_if - notice how rarely the predicate would be true) and I got this:
#include <vector>
#include <string>
#include <iostream>
#include <map>
#include <algorithm>
#include <cassert>
#include <chrono>
#include <memory>
using namespace std;
int main()
{
vector<string> vsp;
int n;
cin >> n;
for (int i =0; i < n; ++i)
{ string s = "123456";
s.push_back('a' + (rand() %26));
vsp.push_back(s);
}
auto vsp2 = vsp;
auto remove_start = std::chrono::high_resolution_clock::now();
auto it=remove_if(begin(vsp),end(vsp), [](const string& s){ return s < "123456b";});
vsp.erase(it,vsp.end());
cout << vsp.size() << endl;
auto remove_end = std::chrono::high_resolution_clock::now();
cout << "erase-remove: " << chrono::duration_cast<std::chrono::milliseconds>(remove_end-remove_start).count() << " milliseconds\n";
auto partition_start = std::chrono::high_resolution_clock::now();
auto it2=partition(begin(vsp2),end(vsp2), [](const string& s){ return s >= "123456b";});
vsp2.erase(it2,vsp2.end());
cout << vsp2.size() << endl;
auto partition_end = std::chrono::high_resolution_clock::now();
cout << "partition-remove: " << chrono::duration_cast<std::chrono::milliseconds>(partition_end-partition_start).count() << " milliseconds\n";
}
C:\STL\MinGW>g++ test_int.cpp -O2 && a.exe
12345678
11870995
erase-remove: 1426 milliseconds
11870995
partition-remove: 658 milliseconds
For other usages, partition is bit faster, same or slower. Color me puzzled. :D
I assume you're asking about a hypothetical definition of stable_remove to be what remove currently is, and remove to be implemented however the implementer thinks is best to give the correct values in any order. With an expectation that implementers will be able to improve on just doing exactly the same as stable_remove.
In practice, the library can't easily do this optimization. It depends on the data, but you don't want to spend too long to work out how many elements will be removed before deciding on how to remove each one. For example you could do an extra pass to count them, but there are plenty of cases where that extra pass is inefficient. Just because an unstable remove is faster than stable for certain cases doesn't necessarily mean that an adaptive algorithm to choose between the two is a good bet.
I think the difference between remove and sort is that sorting is known to be a complicated problem with a lot of different solutions and trade-offs and tweaks. All "simple" sort algorithms are slow on average. Most standard algorithms are pretty simple, and remove is one of them but sort is not. I don't think it makes a lot of sense therefore to define stable_remove and remove as separate standard functions.
Edit: your edit with my tweak (similar to std::partition but no need to keep the values on the right) seems pretty reasonable to me. It requires a bidirectional iterator, but there is precedent in the standard for algorithms that behave differently on different iterator categories, such as std::distance. So it would be possible for the standard to define unstable_remove that only requires a forward iterator, but does your thing if it gets a bidi iterator. The standard probably wouldn't lay out the algorithm, but it could have a phrase like "if the iterator is bidirectional, does at most min(k, n-k) moves where k is the number of elements removed", which would in effect force it. But note that the standard doesn't currently say how many moves remove_if does, so I reckon that pinning this down simply wasn't a priority.
There is of course nothing stopping you from implementing your own unstable_remove.
If we accept that the standard didn't need to specify an unstable remove, the question then comes down to whether the function it does define should have been called stable_remove, anticipating a future remove that behaves differently for bidi iterators, and might behave differently for forward iterators if some clever heuristic for doing an unstable remove ever becomes well enough known to be worth a standard function. I'd say not: it is not a disaster if the names of standard functions aren't completely regular. It could have been pretty disruptive to remove the guarantee of stability from the STL's remove_if. Then the question becomes, "why didn't the STL call it stable_remove_if", to which I can only answer that in addition to all the points made in all the answers, the STL design process was a sight quicker than the standardization process.
stable_remove would also open a can of worms regarding other standard functions that could in theory have unstable versions. For a particularly silly example should copy be called stable_copy, just in case some implementation exists on which its demonstrably faster to reverse the order of elements while copying? Should copy be called copy_forward, so that the implementation can choose which of copy_backward and copy_forward is called by copy according to which is faster? Part of the committee's job is to draw a line somewhere.
I think realistically the current standard is sensible, and it would be sensible to separately define a stable_remove and a remove_with_some_other_constraints, but remove_in_some_unspecified_way just doesn't give the same opportunity for optimization that sort_in_some_unspecified_way does. Introsort was invented in 1997, just as C++ was being standardized, but I don't imagine the research effort around remove is quite what it was and is around sort. I may be wrong, optimizing remove might be the next big thing, and if so then the committee has missed a trick.
std::remove is specified to work with forward iterators.
The approach with working with a pair of iterators, from beginning and from the end, would either increase the requirements for the iterators and thus decrease the utility of the function or violate/worsen asymptotic complexity guarantees.
To answer my own question >3 years later :)
Yes it was a "fail".
There is a proposal D0041R0 that would add unstable_remove.
One could argue that just because there is a proposal to add std::unstable_remove that it does not mean that std::remove was a mistake, but I disagree. :)

Deleting user-defined elements in the middle of a vector

I'm coding a program where I want to draw a card, and then delete so that it doesn't get drawn again.
I have a vector of Cards (class containing 2 structs that define Suit and Value) called deck and I don't really know how to use iterators very well, here a code snippet:
void Player::discardCard(CardDeck masterDeck)
{
cout << "Erasing: " << masterDeck.getDeck().at(cardSelect).toString() << endl;
/*Attempt1*/
masterDeck.getDeck().erase(masterDeck.getDeck().begin()+cardSelect);
/*Attempt 2*/
vector<Card>::iterator itr;
itr = masterDeck.getDeck().begin() + cardSelect;
masterDeck.getDeck().erase(itr);
}
cardSelect has the location of the card I'm going to delete.
It's generated randomly within the boundaries of 0 and the size of deck; therefore it shouldn't be pointing to a position out of boundaries.
Everytime I compile I get the following error:
"Expression: vector erase iterator outside range"
I really don't know what to do, hopefully someonw can help me, thanks in advance!
My bet is that getDeck returns the vector by value. It causes itr to point to and erase to operate on different copies of the vector. Thus you get the error. You should return the vector by reference. Change getDeck signature to this one:
vector<Card>& getDeck()
Let me go off topic first. Your design is a little suspect. First passing in CardDeck by value is almost certainly not what you want but that's even beside the point. Why should your Player class have all this inside knowledge about the private innards of CardDeck. It shouldn't care that you store the deck as a vector or deque (ha ha), or what the structure is. It just shouldn't know that. All it knows is it wants to discard a card.
masterDeck.Discard(selectedCard);
Also note that selectedCard has to be between 0 and ONE LESS than the size of the deck, but even that's probably not your problem (although it will be 1/53rd of the time)
So to answer your question we really would need to now a little more about masterDeck. Did you implement a valid custom copy constructor? Since you're passing by value odds are good you're not correctly copying the underlying vector, in fact it's probably empty and none of the deletes will work. Try checking the size. If you don't ever want the deck copied then you can let the compiler help you by declaring a private copy constructor and then never defining it. See Scott Meyer's Effective C++ Item 11.
Finally one last piece of advice, I believe once you erase with your iterator, you invalidate it. The vector might get reallocated (almost certainly will if you erase anywhere but the end). I'm just telling you so that you don't try to call erase more than once on the same iterator. One of the tricky things about iterators is how easy it can be to invalidate them, which is why you often seen checks for iter != coll.end().
"It's generated randomly within the boundaries of 0 and the size of deck".
The valid range should be "between 0 and the size of the deck minus 1". This could generate range error at run time.

what happens when you modify an element of an std::set?

If I change an element of an std::set, for example, through an iterator, I know it is not "reinserted" or "resorted", but is there any mention of if it triggers undefined behavior? For example, I would imagine insertions would screw up. Is there any mention of specifically what happens?
You should not edit the values stored in the set directly. I copied this from MSDN documentation which is somewhat authoritative:
The STL container class set is used
for the storage and retrieval of data
from a collection in which the values
of the elements contained are unique
and serve as the key values according
to which the data is automatically
ordered. The value of an element in a
set may not be changed directly.
Instead, you must delete old values
and insert elements with new values.
Why this is is pretty easy to understand. The set implementation will have no way of knowing you have modified the value behind its back. The normal implementation is a red-black tree. Having changed the value, the position in the tree for that instance will be wrong. You would expect to see all manner of wrong behaviour, such as exists queries returning the wrong result on account of the search going down the wrong branch of the tree.
The precise answer is platform dependant but as a general rule, a "key" (the stuff you put in a set or the first type of a map) is suppose to be "immutable". To put it simply, that should not be modified, and there is no such thing as automatic re-insertion.
More precisely, the member variables used for to compare the key must not be modified.
Windows vc compiler is quite flexible (tested with VC8) and this code compile:
// creation
std::set<int> toto;
toto.insert(4);
toto.insert(40);
toto.insert(25);
// bad modif
(*toto.begin())=100;
// output
for(std::set<int>::iterator it = toto.begin(); it != toto.end(); ++it)
{
std::cout<<*it<<" ";
}
std::cout<<std::endl;
The output is 100 25 40, which is obviously not sorted... Bad...
Still, such behavior is useful when you want to modify data not participating in the operator <. But you better know what you're doing: that's the price you get for being too flexible.
Some might prefer gcc behavior (tested with 3.4.4) which gives the error "assignment of read-only location". You can work around it with a const_cast:
const_cast<int&>(*toto.begin())=100;
That's now compiling on gcc as well, same output: 100 25 40.
But at least, doing so will probably makes you wonder what's happening, then go to stack overflow and see this thread :-)
You cannot do this; they are const. There exists no method by which the set can detect you making a change to the internal element, and as a result you cannot do so. Instead, you have to remove and reinsert the element. If you are using elements that are expensive to copy, you may have to switch to using pointers and custom comparators (or switch to a C++1x compiler that supports rvalue references, which would make things a whole lot nicer).

STL sorted set where the conditions of order may change

I have a C++ STL set with a custom ordering defined.
The idea was that when items get added to the set, they're naturally ordered as I want them.
However, what I've just realised is that the ordering predicate can change as time goes by.
Presumably, the items in the set will then no longer be in order.
So two questions really:
Is it harmful that the items would then be out of order? Am I right in saying that the worst that can happen is that new entries may get put into the wrong place (which actually I can live with). Or, could this cause crashes, lost entries etc?
Is there a way to "refresh" the ordering of the set? You can't seem to use std::sort() on a set. The best I can come up with is dumping out the contents to a temp container and re-add them.
Any ideas?
Thanks,
John
set uses the ordering to lookup items. If you would insert N items according to ordering1 and insert an item according to ordering2, the set cannot find out if the item is already in.
It will violate the class invariant that every item is in there only once.
So it does harm.
The only safe way to do this with the STL is to create a new set with the changed predicate. For example you could do something like this when you needed to sort the set with a new predicate:
std::set<int> newset( oldset.begin(), oldset.end(), NewPred() );
This is actually implementation dependent.
The STL implementation can and usually will assumes the predicate used for sorting is stable (otherwise, "sorted" would not be defined). It is at least possible to construct a valid STL implementation that formats your hard drive when you change the behavior of the predicate instance.
So, yes, you need to re-insert the items into a new set.
Alternatively, you could construct your own container, e.g. a vector + sort + lower_bound for binary search. Then you could re-sort when the predicates behavior changes.
I agree with the other answers, that this is going to break in some strange and hard to debug ways. If you go the refresh route, you only need to do the copy once. Create a tmp set with the new sorting strategy, add each element from the original set to the tmp set, then do
orig.swap(tmp);
This will swap the internals of the sets.
If this were me, I would wrap this up in a new class that handles all of the details, so that you can change implementations as needed. Depending on your access patterns and the number of times the sort order changes, the previously mentioned vector, sort, lowerbound solution may be preferable.
If you can live with an unordered set, then why are you adding them into a set in the first place?
The only case I can think of is where you just want to make sure the list is unique when you add them. If that's the case then you could use a temporary set to protect additions:
if (ts.insert (value).second) {
// insertion took place
realContainer.push_back (value);
}
An alternative, is that depending on how frequently you'll be modifying the entries in the set, you can probably test to see if the entry will be in a different location (by using the set compare functionality) and where the position will move then remove the old entry and re-add the new one.
As everyone else has pointed out - having the set unordered really smells bad - and I would also guess that its possible got undefined behaviour according to the std.
While this doesn't give you exactly what you want, boost::multi_index gives you similar functionality. Due to the way templates work, you will never be able to "change" the ordering predicate for a container, it is set in stone at compile time, unless you are using a sorted vector or something similar, to where you are the one maintaining the invariant, and you can sort it however you want at any given time.
Multi_index however gives you a way to order a set of elements based on multiple ordering predicates at the same time. You can then select views of the container that behave like an std::set ordered by the predicate that you care about at the time.
This can cause lost entries, when searching for an element in a set the ordering operator is used this means that if an element was placed to the left of the root and now the ordering operator says it's to the right then that element will not longer be found.
Here's a simple test for you:
struct comparer : public std::binary_function<int, int, bool>
{
static enum CompareType {CT_LESS, CT_GREATER} CompareMode;
bool operator()(int lhs, int rhs) const
{
if(CompareMode == CT_LESS)
{
return lhs < rhs;
}
else
{
return lhs > rhs;
}
}
};
comparer::CompareType comparer::CompareMode = comparer::CT_LESS;
typedef std::set<int, comparer> is_compare_t;
void check(const is_compare_t &is, int v)
{
is_compare_t::const_iterator it = is.find(v);
if(it != is.end())
{
std::cout << "HAS " << v << std::endl;
}
else
{
std::cout << "ERROR NO " << v << std::endl;
}
}
int main()
{
is_compare_t is;
is.insert(20);
is.insert(5);
check(is, 5);
comparer::CompareMode = comparer::CT_GREATER;
check(is, 5);
is.insert(27);
check(is, 27);
comparer::CompareMode = comparer::CT_LESS;
check(is, 5);
check(is, 27);
return 0;
}
So, basically if you intend to be able to find the elements you once inserted you should not change the predicate used for insertions and find.
Just a follow up:
While running this code the Visual Studio C debug libraries started throwing exceptions complaining that the "<" operator was invalid.
So, it does seem that changing the sort ordering is a bad thing. Thanks everyone!
1) Harmful - no. Result in crashes - no. The worst is indeed a non-sorted set.
2) "Refreshing" would be the same as re-adding anyway!