unordered set remove/insert in a loop - c++

The values of a std::unordered_set are not mutable, as they are both key and value. What is the correct way to modify an element of a std::unordered_set, if its element are first removed, modified and then reinserted? The erase does not invalidate any iterators, but the insert may. The obvious answer is to use the iterator, that the erase returns. One way, one could deal with this, I guess, is to reset the loop iterator to the beginning of an unordered_set, after a successful insert. I'd like to make sure, it is the only way.

The erase does not invalidate any iterators, but the insert may.
You can always check beforehand if insert is going to do that:
If rehashing occurs due to the insertion, all iterators are
invalidated. Otherwise iterators are not affected. References are not
invalidated. Rehashing occurs only if the new number of elements is
equal to or greater than max_load_factor()*bucket_count().
(from cppreference)
So, if you watch out for rehashing, your approach could work. Of course it leaves you with the problem of what to do when you detect that rehashing will occur.
You can reduce the probability of rehashing by increasing the capacity of the set before the loop.
The simplest way of dealing with rehashing is to start over after rehashing. Maybe there are other ways to deal with it but I wouldn't risk it.
Having said all these, what you are describing here seems to indicate that you probably need another container. If the unordered_set is really the best container for your application, I would most likely still go with Martin's solution, namely with an intermediate container. It's less messy and I can see what's going on; I can reason about correctness.

I think that you are better off using an intermediate container, like this:
unordered_set<int> original;
...
vector<int> temporary;
for (auto it = original.begin(), itEnd = original.end(); it != itEnd; ) {
if (...) {
int newValue = ...;
auto toDelete = it++;
original.erase(toDelete);
temporary.push_back(newValue);
} else {
...
++it;
}
}
original.insert(temporary.begin(), end.begin());

Use a temporary unordered_set that you will fill iterating on the first one (with an hint) then swap the original set with the temporary.

Related

erasing nlohmann::json object during iteration causes segmentation fault

I have a simple database consisting of objects with strings containing unix time as keys and strings containing instructions as values
I want to iterate though the database and erase any object who's key is smaller that current time ( so erase objects with dates before current date)
for (auto it = m_jsonData.begin(); it != m_jsonData.end(); it++) {
if (std::stoi(it.key()) <= (std::time(NULL))) {
std::cout << "command is behind schedule, removing\n";
m_jsonData.erase(it);
} else {
/*
*/
}
}
this code works fine as long as m_jsonData.erase(it); isn't invoked. when it does, in the next iteration std::stoi(it.key()) causes a segfault, after a bit of playing with it I came to a conclusion that is somehow loses track of what it's actually iterating. Is my conclusion true? If not then what is? And how do I fix it?
It's extremely normal for mutating container operations to invalidate iterators. It's one of the first things you should check for.
Documentation for nlohnmann::json::erase():
Notes
Invalidates iterators and references at or after the point of the erase, including the end() iterator.
References and iterators to the erased elements are invalidated. Other references and iterators are not affected.
That means after this line:
m_jsonData.erase(it);
the iterator it can't be used for anything including incrementing it to the next element. It is invalid.
Fortunately, the documentation also points out that the successor to the removed element is returned, so you can just write
for (auto it = m_jsonData.begin(); it != m_jsonData.end(); ) {
if (std::stoi(it.key()) <= (std::time(NULL))) {
it = m_jsonData.erase(it);
} else {
++it;
}
}
Note that when I say this is extremely normal, it's because the standard containers often have similar behaviour. See the documentation for examples, but this is something everyone should be aware of:
std::vector::erase Iterator invalidation
std::unordered_map::erase Iterator invalidation
etc.
This is exactly the reason std::erase was added in C++20, and previously std::remove_if was provided to suppport the erase(remove_if(...), end) idiom, instead of writing fragile mutating loops.

c++ vector object .erase

I have been struggling to put a vector object into a project im doing
I have read what little i could find about doing this and decided to give it a go.
std::vector<BrickFalling> fell;
BrickFalling *f1;
I created the vector. This next piece works fine until i get to the erase
section.
if(brickFall == true){
f1 = new BrickFalling;
f1->getBrickXY(brickfallx,brickfally);
fell.push_back(*f1);
brickFall = false;
}
// Now setup an iterator loop through the vector
vector<BrickFalling>::iterator it;
for( it = fell.begin(); it != fell.end(); ++it ) {
// For each BrickFalling, print out their info
it->printBrickFallingInfo(brick,window,deadBrick);
//This is the part im doing wrong /////
if(deadBrick == true)// if dead brick erase
{
BrickFalling[it].erase;//not sure what im supposed to be doing here
deadBrick = false;
}
}
You can totally avoid the issue by using std::remove_if along with vector::erase.
auto it =
std::remove_if(fell.begin(), fell.end(), [&](BrickFalling& b)
{ bool deadBrick = false;
b.printBrickFallingInfo(brick,window,deadBrick);
return deadBrick; });
fell.erase(it, fell.end());
This avoids the hand-writing of the loop.
In general, you should strive to write erasure loops for sequence containers in this fashion. The reason is that it is very easy to get into the "invalid iterator" scenario when writing the loop yourself, i.e. not remembering to reseat your looping iterator each time an erase is done.
The only issue with your code which I do not know about is the printBrickFallingInfo function. If it throws an exception, you may introduce a bug during the erasure process. In that case, you may want to protect the call with a try/catch block to ensure you don't leave the function block too early.
Edit:
As the comment stated, your print... function could be doing too much work just to determine if a brick is falling. If you really are attempting to print stuff and do even more things that may cause some sort of side-effect, another approach similar in nature would be to use std::stable_partition.
With std::stable_partition you can "put on hold" the erasure and just move the elements to be erased at one position in the container (either at the beginning or at the end) all without invalidating those items. That's the main difference -- with std::stable_partition, all you would be doing is move the items to be processed, but the items after movement are still valid. Not so with std::remove and std::remove_if -- moved items are just invalid and any attempt to use those items as if they are still valid is undefined behavior.
auto it =
std::stable_partition(fell.begin(), fell.end(), [&](BrickFalling& b)
{ bool deadBrick = false;
b.printBrickFallingInfo(brick,window,deadBrick);
return deadBrick; });
// if you need to do something with the moved items besides
// erasing them, you can do so. The moved items start from
// fell.begin() up to the iterator it.
//...
//...
// Now we erase the items since we're done with them
fell.erase(fell.begin(), it);
The difference here is that the items we will eventually erase will lie to the left of the partitioning iterator it, so our erase() call will remove the items starting from the beginning. In addition to that, the items are still perfectly valid entries, so you can work with them in any way you wish before you finally erase them.
The other answer detailing the use of remove_if should be used whenever possible. If, however, your situations does not allow you to write your code using remove_if, which can happen in more complicated situations, you can use the following:
You can use vector::erase with an iterator to remove the element at that spot. The iterator used is then invalidated. erase returns a new iterator that points to the next element, so you can use that iterator to continue.
What you end up with is a loop like:
for( it = fell.begin(); it != fell.end(); /* iterator updated in loop */ )
{
if (shouldDelete)
it = fell.erase(it);
else
++it;
}

Erasing item in a for(-each) auto loop

Is there a way to erase specific elements when using a auto variable in a for loop like this?
for(auto a: m_Connections)
{
if(something)
{
//Erase this element
}
}
I know I can either do say
for(auto it=m_map.begin() ...
or
for(map<int,int>::iterator it=m_map.begin() ...
and manually increment the iterator (and erase) but if I could do it with less lines of code I'd be happier.
Thanks!
You can't. A range-based loop makes a simple iteration over a range simpler, but doesn't support anything that invalidates either the range, or the iterator it uses. Of course, even if that were supported, you couldn't efficiently erase an element without access to the iterator.
You'll need an old-school loop, along the lines of
for (auto it = container.begin(); it != container.end();) {
if (something) {
it = container.erase(it);
} else {
++it;
}
}
or a combination of container.erase() and std::remove_if, if you like that sort of thing.
No, there isn't. Range based for loop is used to access each element of a container once.
Every time an element is removed from the container, iterators at or after the erased element are no longer valid (and given the implementation of the range-based-for this is a problem).
You should use the normal for loop (or a while) if you need to modify the container as you go along.
If you want to erase elements for which a predicate returns true, a good way is:
m_Connections.erase(
std::remove_if(m_Connections.begin(),
m_Connections.end(),
[](Type elem) { return predicate(elem); }),
m_Connections.end());
std::remove_if doesn't mix iteration logic with the predicate.
You need the iterator if you want to erase an element from a container.
And you can't get the iterator from the element itself -- and even if you could, for instance with vector, the iterator that range-based for internally uses would be invalidated in the next step causing undefined behavior.
So the answer is: No, in its classic usage you can't. range-based for was solely designed for convenient iteration of all elements in a range.
push all elements into array and then do pop operation to remove the item

Keeping std::list iterators valid through insertion

Note: This is not a question whether I should "use list or deque". It's a question about the validity of iterators in the face of insert().
This may be a simple question and I'm just too dense to see the right way to do this. I'm implementing (for better or worse) a network traffic buffer as a std::list<char> buf, and I'm maintaining my current read position as an iterator readpos.
When I add data, I do something like
buf.insert(buf.end(), newdata.begin(), newdata.end());
My question is now, how do I keep the readpos iterator valid? If it points to the middle of the old buf, then it should be fine (by the iterator guarantees for std::list), but typically I may have read and processed all data and I have readpos == buf.end(). After the insertion, I want readpos always to point to the next unread character, which in case of the insertion should be the first inserted one.
Any suggestions? (Short of changing the buffer to a std::deque<char>, which appears to be much better suited to the task, as suggested below.)
Update: From a quick test with GCC4.4 I observe that deque and list behave differently with respect to readpos = buf.end(): After inserting at the end, readpos is broken in a list, but points to the next element in a deque. Is this a standard guarantee?
(According to cplusplus, any deque::insert() invalidated all iterators. That's no good. Maybe using a counter is better than an iterator to track a position in a deque?)
if (readpos == buf.begin())
{
buf.insert(buf.end(), newdata.begin(), newdata.end());
readpos = buf.begin();
}
else
{
--readpos;
buf.insert(buf.end(), newdata.begin(), newdata.end());
++readpos;
}
Not elegant, but it should work.
From http://www.sgi.com/tech/stl/List.html
"Lists have the important property that insertion and splicing do not invalidate iterators to list elements, and that even removal invalidates only the iterators that point to the elements that are removed."
Therefore, readpos should still be valid after the insert.
However...
std::list< char > is a very inefficient way to solve this problem. Each byte you store in a std::list requires a pointer to keep track of the byte, plus the size of the list node structure, two more pointers usually. That is at least 12 or 24 bytes (32 or 64-bit) of memory used to keep track of a single byte of data.
std::deque< char> is probably a better container for this. Like std::vector it provides constant time insertions at the back however it also provides constant time removal at the front. Finally, like std::vector std::deque is a random-access container so you can use offsets/indexes instead of iterators. These three features make it an efficient choice.
I was indeed being dense. The standard gives us all the tools we need. Specifically, the sequence container requirements 23.2.3/9 say:
The iterator returned from a.insert(p, i, j) points to the copy of the first element inserted into a, or p if i == j.
Next, the description of list::insert says (23.3.5.4/1):
Does not affect the validity of iterators and references.
So in fact if pos is my current iterator inside the list which is being consumed, I can say:
auto it = buf.insert(buf.end(), newdata.begin(), newdata.end());
if (pos == buf.end()) { pos = it; }
The range of new elements in my list is [it, buf.end()), and the range of yet unprocessed elements is [pos, buf.end()). This works because if pos was equal to buf.end() before the insertion, then it still is after the insertion, since insertion does not invalidate any iterators, not even the end.
list<char> is a very inefficient way to store a string. It is probably 10-20 times larger than the string itself, plus you are chasing a pointer for every character...
Have you considered using std::dequeue<char> instead?
[edit]
To answer your actual question, adding and removing elements does not invalidate iterators in a list... But end() is still going to be end(). So you would need to check for that as a special case at the point where you insert the new element in order to update your readpos iterator.

Problem with invalidation of STL iterators when calling erase

The STL standard defines that when an erase occurs on containers such as std::deque, std::list etc iterators are invalidated.
My question is as follows, assuming the list of integers contained in a std::deque, and a pair of indicies indicating a range of elements in the std::deque, what is the correct way to delete all even elements?
So far I have the following, however the problem here is that the assumed end is invalidated after an erase:
#include <cstddef>
#include <deque>
int main()
{
std::deque<int> deq;
for (int i = 0; i < 100; deq.push_back(i++));
// range, 11th to 51st element
std::pair<std::size_t,std::size_t> r(10,50);
std::deque<int>::iterator it = deq.begin() + r.first;
std::deque<int>::iterator end = deq.begin() + r.second;
while (it != end)
{
if (*it % 2 == 0)
{
it = deq.erase(it);
}
else
++it;
}
return 0;
}
Examining how std::remove_if is implemented, it seems there is a very costly copy/shift down process going on.
Is there a more efficient way of achieving the above without all the copy/shifts
In general is deleting/erasing an element more expensive than swapping it with the next nth value in the sequence (where n is the number of elements deleted/removed so far)
Note: Answers should assume the sequence size is quite large, +1mil elements and that on average 1/3 of elements would be up for erasure.
I'd use the Erase-Remove Idiom. I think the Wikipedia article linked even shows what you're doing -- removing odd elements.
The copying that remove_if does is no more costly than what happens when you delete elements from the middle of the container. It might even be more efficient.
Calling .erase() also results in "a very costly copy/shift down process going on.". When you erase an element from the middle of the container, every other element after that point must be shifted down one spot into the available space. If you erase multiple elements, you incur that cost for every erased element. Some of the non-erased elements will move several spots, but are forced to move one spot at a time instead of all at once. That is very inefficient.
The standard library algorithms std::remove and std::remove_if optimize this work. They use a clever trick to ensure that every moved element is only moved once. This is much, much faster than what you are doing yourself, contrary to your intuition.
The pseudocode is like this:
read_location <- beginning of range.
write_location <- beginning of range.
while read_location != end of range:
if the element at read_location should be kept in the container:
copy the element at the read_location to the write_location.
increment the write_location.
increment the read_location.
As you can see, every element in the original sequence is considered exactly once, and if it needs to be kept, it gets copied exactly once, to the current write_location. It will never be looked at again, because the write_location can never run in front of the read_location.
Remember that deque is a contiguous memory container (like vector, and probably sharing implementation), so removing elements mid-container necessarily means copying subsequent elements over the hole. You just want to make sure you're doing one iteration and copying each not-to-be-deleted object directly to its final position, rather than moving all objects one by one during each delete. remove_if is efficient and appropriate in this regard, your erase loop is not: it does massive amounts of unnecessary copying.
FWIW - alternatives:
add a "deleted" state to your objects and mark them deleted in place, but then every time you operate on the container you'll need to check yourself
use a list, which is implemented using pointers to previous and next elements, such that removing a list element alters the adjacent points to bypass that element: no copying, efficient iteration, just no random access, more small (i.e. inefficient) heap allocations and pointer overheads
What to choose depends on the nature, relative frequency, and performance requirements of specific operations (e.g. it may be that you can afford slow removals if they're done at non-critical times, but need fastest-possible iteration - whatever it is, make sure you understand your needs and the implications of the various data structures).
One alternative that hasn't been mentioned is to create a new deque, copy the elements that you want to keep into it, and swap it with the old deque.
void filter(std::deque<int>& in, std::pair<std::size_t,std::size_t> range) {
std::deque<int> out;
std::deque<int>::const_iterator first = in.begin();
std::deque<int>::const_iterator curr = first + range.first;
std::deque<int>::const_iterator last = first + range.second;
out.reserve(in.size() - (range.second-range.first));
std::copy(first, curr, std::back_inserter(out));
while (curr != last) {
if (*curr & 1) {
out.push_back(*curr);
}
++curr;
}
std::copy(last, in.end(), std::back_inserter(out));
in.swap(out);
}
I'm not sure if you have enough memory to create a copy, but it usually is faster and easier to make a copy instead of trying to inline erase elements from a large collection. If you still see memory thrashing, then figure out how many elements you are going to keep by calling std::count_if and reserve that many. This way you would have a single memory allocation.