Modifying a data structure while iterating over it - c++

What happens when you add elements to a data structure such as a vector while
iterating over it. Can I not do this?
I tried this and it breaks:
int main() {
vector<int> x = { 1, 2, 3 };
int j = 0;
for (auto it = x.begin(); it != x.end(); ++it) {
x.push_back(j);
j++;
cout << j << " .. ";
}
}

Iterators are invalidated by some operations that modify a std::vector.
Other containers have various rules about when iterators are and are not invalidated. This is a post (by yours truly) with details.
By the way, the entrypoint function main() MUST return int:
int main() { ... }

What happens when you add elements to a data structure such as a vector while iterating over it. Can I not to this?
The iterator would become invalid IF the vector resizes itself. So you're safe as long as the vector doesn't resize itself.
I would suggest you to avoid this.
The short explanation why resizing invalidates iterator:
Initially the vector has some capacity (which you can know by calling vector::capacity().), and you add elements to it, and when it becomes full, it allocates larger size of memory, copying the elements from the old memory to the newly allocated memory, and then deletes the old memory, and the problem is that iterator still points to the old memory, which has been deallocated. That is how resizing invalidates iterator.
Here is simple demonstration. Just see when the capacity changes:
std::vector<int> v;
for(int i = 0 ; i < 100 ; i++ )
{
std::cout <<"size = "<<v.size()<<", capacity = "<<v.capacity()<<std::endl;
v.push_back(i);
}
Partial Output:
size = 0, capacity = 0
size = 1, capacity = 1
size = 2, capacity = 2
size = 3, capacity = 4
size = 4, capacity = 4
size = 5, capacity = 8
size = 6, capacity = 8
size = 7, capacity = 8
size = 8, capacity = 8
size = 9, capacity = 16
size = 10, capacity = 16
See the complete output here : http://ideone.com/rQfWe
Note: capacity() tells the maximum number of elements the vector can contain without allocating new memory, and size() tells the number of elements the vector currently containing.

It's not a good idea to do it.
You could think about the case where your vector would need to be resized after a push_back. It would then need to be moved to a bigger memory spot and your iterators would now be invalid.

It's a bad idea in general, because if the vector is resized, the iterator will become invalid (it's wrapping a pointer into the vector's memory).
It's also not clear what your code is really trying to do. If the iterator somehow didn't become invalid (suppose it was implemented as an index), I'd expect you to have an infinite loop there - the end would never be reached because you're always adding elements.
Assuming you want to loop over the original elements, and add one for each, one solution would be to add the new elements to a second vector, and then concatenate that at the end:
vector<int> temp;
// ...
// Inside loop, do this:
temp.push_back(j);
// ...
// After loop, do this to insert all new elements onto end of x
x.insert(x.end(), temp.begin(), temp.end());

While you used vector as an example, there are other stl containers which are able to have elements pushed-back without invalidating iterators. Pushing back an element into a std::list doesn't require any re-allocation of existing elements as they aren't stored contiguously (lists instead comprise of nodes linked together by pointers to the next node), therefore iterators remain valid as the node they internally point to still resides at the same address.

if you need to do it this way, you can reserve the maximum number of records you could add. this will stop the vector from needing to resize, and this should prevent crashes

Related

Searching using unordered map vs array

I have a block allocation function that takes in an array and then searches through it to find values 0 which indicates the free space, and then allocates blocks to the available free space. I am trying to use unordered map to improve the speed of searching for 0s. In my function, all the elements in the array are inserted into the unordered map. I was wondering if implementing unordered map like below even improves the searching speed compared to just using arrays?
int arr[] = {15, 11, 0, 0, 0, 27, 0, 0}; // example array
int n = sizeof(arr)/sizeof(arr[0]);
unordered_map<int, int> hash;
for(i=n;i>=0;i--)
{
hash[i+1] = arr[i];
}
for(auto v : hash)
{
if(v.second==0)
{
return v.second;
}
}
int arr[] = {15, 11, 0, 0, 0, 27, 0, 0};
int n = sizeof(arr)/sizeof(arr[0]);
for(i=0;i<n;i++)
{
if(arr[i]==0)
{
return arr[i];
}
}
First, note that both functions as you've written them always return zero, which is not what you want.
Now, to answer the main question: No, this approach doesn't help. In both cases you're just iterating over the values in the array until you hit on one that's a zero. This is an O(n) operation in the worst case, and introducing an unordered_map here is just slowing things down.
The most similar thing you could do here that would actually help would be something like
std::unordered_map<int, std::vector<int>> lookup;
for(int i = 0; i < n; i++)
{
lookup[arr[i]].push_back(i);
}
Now if you want to find a block with a zero in it you just an element from lookup[0].
However, given that we only need to track the blocks with zeroes in them, and not immediately look up the blocks with, say, a 13 in them, we may as well just do:
std::vector<int> emptyBlocks;
for(int i = 0; i < n; i++)
{
if(arr[i] == 0) { emptyBlocks.push_back(i); }
}
and then we can just grab empty blocks as we need them.
Note that you should take blocks from the back of emptyBlocks so that deleting them from the list doesn't require us to shift everything over. If you need to take the smallest indices first for some reason, traverse arr backwards when building the list of empty blocks.
That said, when you're allocating blocks typically you're trying to find a range of consecutive empty blocks. If that's the case, what you likely want is a way to look up the starting point of blocks of a given size. And you probably want it to be ordered, too, so that you can ask for "the smallest block at least this large."

Vector size increases only by one when insert with multiple elements

I'm currently learning C++ and for my current goal I want to fill values from vector A into vector X. vector X must not be larger than 20. To archive this I have a function that checks if the current vector has space left. If no space is left, it replaces the current vector with a new vector from a template (which already contains some entries, that have to be the same for all the vectors):
int ChunkedBufferBuilder::checkNewBuffer() {
int space_left = chunk_size - super::current.size();
if (space_left == 0) {
bufVec.push_back(super::current);
super::current = tpl;
space_left = chunk_size - super::current.size() ;
}
return space_left;
}
bufVec is another vector holding all vectors that have reached a size of 20. This function returns always the remaining amount of "free" entries in the current Vector.
To prevent having to insert every single value from the input vector in to the smaller vectors, I'm trying to use the insert function here:
ChunkedBufferBuilder* ChunkedBufferBuilder::push_back(const std::vector<uint8_t> &vec) {
auto pos = 0;
const auto size = vec.size();
while (pos < size) {
const int space_left = checkNewBuffer();
const int to = (space_left > size - pos) ? size : pos + space_left;
auto fromPtr = vec.at(pos);
auto toPtr = vec.at(to);
int size = super::current.size();
super::current.insert(super::current.end(), fromPtr, toPtr);
size = super::current.size();
pos = to;
}
return this;
}
Note, that the second size = super::current.size() was placed there by me for helping to debug. In the following images I have set two breakpoints. One on the line where insert is called and the other one on the pos = to assignment.
The documentation of std::vector states that:
The vector is extended by inserting new elements before the element at the specified position, effectively increasing the container size by the number of elements inserted.
Thus I expect that the size increases by the amount of elements, that I added. But when I run the debugger:
I get these values at the first breakpoint:
And at the second breakpoint size only increased by one:
However on the second pass of the while loop, it then tries to insert another twelve values (size is still 8 so 20 - 8 == 12):
And then suddenly the size jumps to 22:
This is currently breaking my program and I'm pretty much clueless why my code behaves in the way it currently does.
In this snippet:
auto fromPtr = vec.at(pos);
auto toPtr = vec.at(to);
super::current.insert(super::current.end(), fromPtr, toPtr);
you are inserting fromPtr copies of the value toPtr (i.e. overload (3)). From your description, it appears you mean to use overload (4):
super::current.insert(super::current.end(), vec + pos, vec + to);
which copies elements from vec to super::current.

Is there a better way of moving elements in a vector

I have a std::vector of elements and would like to move an element to a specified position.
I already have a solution, but I'm courious, if there is a better way to do so.
Let'ts assume I'd like to move the last element to the index pos;
I could do a
auto posToInsert = vecElements.begin();
std::advance(posToInsert, pos);
vecElements.insert(posToInsert, *m_vecRows.rbegin());
vecElements.erase(m_vecRows.rbegin());
but this will reallocate memory.
Sadly a
std::move(vecElements.rbegin(), vecElements.rbegin(), posToInsert);
doesn't do the trick.
My current solution does some swaps, but no new memory allocation
auto newElement = vecElements.rbegin();
for (auto currentPos = vecElements.size()-1; currentPos != pos; --currentPos)
newElement->swap(*(newElement + 1)); // reverseIterator +1 = element before
To clarify it, because #NathanOliver asked ... the remaining ordering of the vector should be preserved.
Is there a better way of doing it?
You could use std::rotate:
#include <algorithm>
#include <vector>
#include <iostream>
int main()
{
std::vector<int> values{1, 2, 3, 4, 5};
std::rotate(values.begin()+2, values.end()-1, values.end());
for(int i: values)
std::cout << i << " ";
std::cout << "\n";
}
try it
Outputs:
1 2 5 3 4
You can probably adjust the iterators used if you need to move an element that isn't at the end.
Move the element out of the vector, erase the element, then insert. This is guaranteed to not reallocate as that only happens when size() > capcity() and that can't happen here because erasing first guarantees that size() <= capcity() - 1
In the case of moving the last element, that would look like
auto temp = std::move(vecElements.back())
vecElements.erase(vecElements.rbegin());
vecElements.insert(posToInsert, std::move(temp));
So, this cost you two moves and no reallocation.

How to iterate through a list while adding items to it

I have a list of line segments (a std::vector<std::pair<int, int> > that I'd like to iterate through and subdivide. The algorithm would be, in psuedocode:
for segment in vectorOfSegments:
firstPoint = segment.first;
secondPoint = segment.second;
newMidPoint = (firstPoint + secondPoint) / 2.0
vectorOfSegments.remove(segment);
vectorOfSegments.push_back(std::make_pair(firstPoint, newMidPoint));
vectorOfSegments.push_back(std::make_pair(newMidPoint, secondPoint));
The issue that I'm running into is how I can push_back new elements (and remove the old elements) without iterating over this list forever.
It seems like the best approach may be to make a copy of this vector first, and use the copy as a reference, clear() the original vector, and then push_back the new elements to the recently emptied vector.
Is there a better approach to this?
It seems like the best approach may be to make a copy of this vector first, and use the copy as a reference, clear() the original vector, and then push_back the new elements to the recently emptied vector.
Almost. You don't need to copy-and-clear; move instead!
// Move data from `vectorOfSegments` into new vector `original`.
// This is an O(1) operation that more than likely just swaps
// two pointers.
std::vector<std::pair<int, int>> original{std::move(vectorOfSegments)};
// Original vector is now in "a valid but unspecified state".
// Let's run `clear()` to get it into a specified state, BUT
// all its elements have already been moved! So this should be
// extremely cheap if not a no-op.
vectorOfSegments.clear();
// We expect twice as many elements to be added to `vectorOfSegments`
// as it had before. Let's reserve some space for them to get
// optimal behaviour.
vectorOfSegments.reserve(original.size() * 2);
// Now iterate over `original`, adding to `vectorOfSegments`...
Don't remove elements while you insert new segments. Then, when finished with inserting you could remove the originals:
int len=vectorOfSegments.size();
for (int i=0; i<len;i++)
{
std::pair<int,int>& segment = vectorOfSegments[i];
int firstPoint = segment.first;
int secondPoint = segment.second;
int newMidPoint = (firstPoint + secondPoint) / 2;
vectorOfSegments.push_back(std::make_pair(firstPoint, newMidPoint));
vectorOfSegments.push_back(std::make_pair(newMidPoint, secondPoint));
}
vectorOfSegments.erase(vectorOfSegments.begin(),vectorOfSegments.begin()+len);
Or, if you want to replace one segment by two new segments in one pass, you could use iterators like here:
for (auto it=vectorOfSegments.begin(); it != vectorOfSegments.end(); ++it)
{
std::pair<int,int>& segment = *it;
int firstPoint = segment.first;
int secondPoint = segment.second;
int newMidPoint = (firstPoint + secondPoint) / 2;
it = vectorOfSegments.erase(it);
it = vectorOfSegments.insert(it, std::make_pair(firstPoint, newMidPoint));
it = vectorOfSegments.insert(it+1, std::make_pair(newMidPoint, secondPoint));
}
As Lightning Racis in Orbit pointed out, you should do a reserve before either of these approaches. In the first case do reserve(vectorOfSegmets.size()*3), in the latter reserve(vectorOfSegmets.size()*2+1)
This is easiest solved by using an explicit index variable like this:
for(size_t i = 0; i < segments.size(); i++) {
... //other code
if(/*condition when to split segments*/) {
Point midpoint = ...;
segments[i] = Segment(..., midpoint); //replace the segment by the first subsegment
segments.emplace_back(Segment(midpoint, ...)); //add the second subsegment to the end of the vector
i--; //reconsider the first subsegment
}
}
Notes:
segments.size() is called in each iteration of the loop, so we really reconsider all appended segments.
The explicit index means that the std::vector<> is free to reallocate in the emplace_back() call, there are no iterators/pointers/references that can become invalid.
I assumed that you don't care about the order of your vector because you add the new segments to the end of the vector. If you do care, you might want to use a linked list to avoid quadratic complexity of your algorithm as insertion/deletion to/from an std::vector<> has linear complexity. In my code I avoid insertion/deletion by replacing the old segment.
Another approach to retain order would be to ignore order at first and then reestablish order via sorting. Assuming a good sorting algorithm, that is O(n*log(n)) which is still better than the naive O(n^2) but worse than the O(n) of the linked list approach.
If you don't want to reconsider the new segments, just use a constant size and omit the counter decrement:
size_t count = segments.size();
for(size_t i = 0; i < count; i++) {
... //other code
if(/*condition when to split segments*/) {
Point midpoint = ...;
segments[i] = Segment(..., midpoint); //replace the segment by the first subsegment
segments.emplace_back(Segment(midpoint, ...)); //add the second subsegment to the end of the vector
}
}

Erasing multiple objects from a std::vector?

Here is my issue, lets say I have a std::vector with ints in it.
let's say it has 50,90,40,90,80,60,80.
I know I need to remove the second, fifth and third elements. I don't necessarily always know the order of elements to remove, nor how many. The issue is by erasing an element, this changes the index of the other elements. Therefore, how could I erase these and compensate for the index change. (sorting then linearly erasing with an offset is not an option)
Thanks
I am offering several methods:
1. A fast method that does not retain the original order of the elements:
Assign the current last element of the vector to the element to erase, then erase the last element. This will avoid big moves and all indexes except the last will remain constant. If you start erasing from the back, all precomputed indexes will be correct.
void quickDelete( int idx )
{
vec[idx] = vec.back();
vec.pop_back();
}
I see this essentially is a hand-coded version of the erase-remove idiom pointed out by Klaim ...
2. A slower method that retains the original order of the elements:
Step 1: Mark all vector elements to be deleted, i.e. with a special value. This has O(|indexes to delete|).
Step 2: Erase all marked elements using v.erase( remove (v.begin(), v.end(), special_value), v.end() );. This has O(|vector v|).
The total run time is thus O(|vector v|), assuming the index list is shorter than the vector.
3. Another slower method that retains the original order of the elements:
Use a predicate and remove if as described in https://stackoverflow.com/a/3487742/280314 . To make this efficient and respecting the requirement of
not "sorting then linearly erasing with an offset", my idea is to implement the predicate using a hash table and adjust the indexes stored in the hash table as the deletion proceeds on returning true, as Klaim suggested.
Using a predicate and the algorithm remove_if you can achieve what you want : see http://www.cplusplus.com/reference/algorithm/remove_if/
Don't forget to erase the item (see remove-erase idiom).
Your predicate will simply hold the idx of each value to remove and decrease all indexes it keeps each time it returns true.
That said if you can afford just removing each object using the remove-erase idiom, just make your life simple by doing it.
Erase the items backwards. In other words erase the highest index first, then next highest etc. You won't invalidate any previous iterators or indexes so you can just use the obvious approach of multiple erase calls.
I would move the elements which you don't want to erase to a temporary vector and then replace the original vector with this.
While this answer by Peter G. in variant one (the swap-and-pop technique) is the fastest when you do not need to preserve the order, here is the unmentioned alternative which maintains the order.
With C++17 and C++20 the removal of multiple elements from a vector is possible with standard algorithms. The run time is O(N * Log(N)) due to std::stable_partition. There are no external helper arrays, no excessive copying, everything is done inplace. Code is a "one-liner":
template <class T>
inline void erase_selected(std::vector<T>& v, const std::vector<int>& selection)
{
v.resize(std::distance(
v.begin(),
std::stable_partition(v.begin(), v.end(),
[&selection, &v](const T& item) {
return !std::binary_search(
selection.begin(),
selection.end(),
static_cast<int>(static_cast<const T*>(&item) - &v[0]));
})));
}
The code above assumes that selection vector is sorted (if it is not the case, std::sort over it does the job, obviously).
To break this down, let us declare a number of temporaries:
// We need an explicit item index of an element
// to see if it should be in the output or not
int itemIndex = 0;
// The checker lambda returns `true` if the element is in `selection`
auto filter = [&itemIndex, &sorted_sel](const T& item) {
return !std::binary_search(
selection.begin(),
selection.end(),
itemIndex++);
};
This checker lambda is then fed to std::stable_partition algorithm which is guaranteed to call this lambda only once for each element in the original (unpermuted !) array v.
auto end_of_selected = std::stable_partition(
v.begin(),
v.end(),
filter);
The end_of_selected iterator points right after the last element which should remain in the output array, so we now can resize v down. To calculate the number of elements we use the std::distance to get size_t from two iterators.
v.resize(std::distance(v.begin(), end_of_selected));
This is different from the code at the top (it uses itemIndex to keep track of the array element). To get rid of the itemIndex, we capture the reference to source array v and use pointer arithmetic to calculate itemIndex internally.
Over the years (on this and other similar sites) multiple solutions have been proposed, but usually they employ multiple "raw loops" with conditions and some erase/insert/push_back calls. The idea behind stable_partition is explained beautifully in this talk by Sean Parent.
This link provides a similar solution (and it does not assume that selection is sorted - std::find_if instead of std::binary_search is used), but it also employs a helper (incremented) variable which disables the possibility to parallelize processing on larger arrays.
Starting from C++17, there is a new first argument to std::stable_partition (the ExecutionPolicy) which allows auto-parallelization of the algorithm, further reducing the run-time for big arrays. To make yourself believe this parallelization actually works, there is another talk by Hartmut Kaiser explaining the internals.
Would this work:
void DeleteAll(vector<int>& data, const vector<int>& deleteIndices)
{
vector<bool> markedElements(data.size(), false);
vector<int> tempBuffer;
tempBuffer.reserve(data.size()-deleteIndices.size());
for (vector<int>::const_iterator itDel = deleteIndices.begin(); itDel != deleteIndices.end(); itDel++)
markedElements[*itDel] = true;
for (size_t i=0; i<data.size(); i++)
{
if (!markedElements[i])
tempBuffer.push_back(data[i]);
}
data = tempBuffer;
}
It's an O(n) operation, no matter how many elements you delete. You could gain some efficiency by reordering the vector inline (but I think this way it's more readable).
This is non-trival because as you delete elements from the vector, the indexes change.
[0] hi
[1] you
[2] foo
>> delete [1]
[0] hi
[1] foo
If you keep a counter of times you delete an element and if you have a list of indexes you want to delete in sorted order then:
int counter = 0;
for (int k : IndexesToDelete) {
events.erase(events.begin()+ k + counter);
counter -= 1;
}
You can use this method, if the order of the remaining elements doesn't matter
#include <iostream>
#include <vector>
using namespace std;
int main()
{
vector< int> vec;
vec.push_back(1);
vec.push_back(-6);
vec.push_back(3);
vec.push_back(4);
vec.push_back(7);
vec.push_back(9);
vec.push_back(14);
vec.push_back(25);
cout << "The elements befor " << endl;
for(int i = 0; i < vec.size(); i++) cout << vec[i] <<endl;
vector< bool> toDeleted;
int YesOrNo = 0;
for(int i = 0; i<vec.size(); i++)
{
cout<<"You need to delete this element? "<<vec[i]<<", if yes enter 1 else enter 0"<<endl;
cin>>YesOrNo;
if(YesOrNo)
toDeleted.push_back(true);
else
toDeleted.push_back(false);
}
//Deleting, beginning from the last element to the first one
for(int i = toDeleted.size()-1; i>=0; i--)
{
if(toDeleted[i])
{
vec[i] = vec.back();
vec.pop_back();
}
}
cout << "The elements after" << endl;
for(int i = 0; i < vec.size(); i++) cout << vec[i] <<endl;
return 0;
}
Here's an elegant solution in case you want to preserve the indices, the idea is to replace the values you want to delete with a special value that is guaranteed not be used anywhere, and then at the very end, you perform the erase itself:
std::vector<int> vec = {1, 2, 3, 4, 5, 6, 7, 8, 9};
// marking 3 elements to be deleted
vec[2] = std::numeric_limits<int>::lowest();
vec[5] = std::numeric_limits<int>::lowest();
vec[3] = std::numeric_limits<int>::lowest();
// erase
vec.erase(std::remove(vec.begin(), vec.end(), std::numeric_limits<int>::lowest()), vec.end());
// print values => 1 2 5 7 8 9
for (const auto& value : vec) std::cout << ' ' << value;
std::cout << std::endl;
It's very quick if you delete a lot of elements because the deletion itself is happening only once. Items can also be deleted in any order that way.
If you use a a struct instead of an int, then you can still mark an element of that struct, for ex dead=true and then use remove_if instead of remove =>
struct MyObj
{
int x;
bool dead = false;
};
std::vector<MyObj> objs = {{1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}, {9}};
objs[2].dead = true;
objs[5].dead = true;
objs[3].dead = true;
objs.erase(std::remove_if(objs.begin(), objs.end(), [](const MyObj& obj) { return obj.dead; }), objs.end());
// print values => 1 2 5 7 8 9
for (const auto& obj : objs) std::cout << ' ' << obj.x;
std::cout << std::endl;
This one is a bit slower, around 80% the speed of the remove.